THE ANNALS 
of 
MATHEMATICAL 


STATISTICS 


(FOUNDED BY H. C. CARVER) 


THE OFFICIAL JOURNAL OF THE INSTITUTE 
OF MATHEMATICAL STATISTICS 


Contents 


Sample Criteria for Testing Equality of Means, Equality of Vari- 
ances, and Equality of Covariances in a Normal Multivariate 
Distribution. 

Contributions to the Theory of Sequential Analysis II, III. M. A. 
GIRSHICK 

Sufficient Statistical Estimation Functions for the Parameters of the 
Distribution of Maximum Values. Braprorp F. KIMBALL... 299 

On Functions of Sequences of Independent Chance Vectors with 
Applications to the Problem of the “Random Walk” in k 
Dimensions. D. BLACKWELL AND M. A. GIRsHICK 

Approximation of the Distribution of the Product of Beta Variables 
by a Single Beta Variable. JoHN W.TuKrEy ANDS.S. WILKs.. 318 

Some Fundamental Curves for the Solution of Sampling Problems. 
A Sh. RI io oss a 5 one beens gaeetaiw es see cee 325 

Enlargement Methods for Computing the Inverse Matrix. Louis 


PAGE 


The Frequency Distribution of Deviates from Means and Regression 
Lines in Samples from a Multivariate Normal Population. 
D. J. FINNEY 

On the Asymptotic Distributions of Certain Statistics used in Test- 
ing the Independence between Successive Observations from a 
Normal Population. P. L. Hsu 

Notes: 
Estimating the Parameters of a Rectangular Distribution. A.GrorcE 

CARLTON 

On the Power Function of the Sign Test for Slippage of Means. 


An Approximation to the Probability Integral. 
Distribution of the Ratio of S4mple Range to Sample Standard Devia- 
tion for Normal and Combinations of Normal Distributions. G.A. 


Vol. XVII, No. 3 — September, 1946 





Insurance THE ANNALS 


Library 


HA OF MATHEMATICAL STATISTICS 
e R lb ~ EDITED BY 
wes > S. S. WILKS, Editor 
C. C. CRAIG W. FELLER J. NEYMAN | 
ALLEN T. CRAIG THORNTON C. FRY WALTER A. SHEWHART | 
W. EDWARDS DEMING HAROLD HOTELLING A. WALD j 


WITH THE COOPERATION OF 


Wituram G. CocHran Paut 8S. Dwyer ALEXANDER M. Moop 

J. H. Curtiss CHURCHILL EISENHART Henry ScHerré 

J. F. Daty Paut R. Hatmos JoHNn W. TUKEY 

Haro.up F. Dopcs Paut G. Hor. Jacosp WoLFowI1Tz 
Wi.ti1am G. Mapow 


The ANNALS OF MATHEMATICAL Statistics is published quarterly by the § 
Institute of Mathematical Statistics, Mt. Royal & Guilford Aves., Baltimore 2, | 
Md. Subscriptions, renewals, orders for back numbers and other business com- 
munications should be sent to the ANNALS OF MATHEMATICAL Statistics, Mt. # 
Royal & Guilford Aves., Baltimore 2, Md., or to the Secretary of the Insti- ~ 
tute of Mathematical Statistics, P.S. Dwyer, 116 Rackham Hall, University of © 
Michigan, Ann Arbor, Mich. i 

Changes in mailing address which are to become effective for a given © 
issue should be reported to the Secretary on or before the 15th of the % 
month preceding the month of that issue. The months of issue are March, © 
June, September and December. Because of war-time difficulties of publica- | 
tion, issues may often be from two to four weeks late in appearing. | 
Subscribers are therefore requested to wait at least 30 days after month of issue % 
before making inquiries concerning non-delivery. 


Manuscripts for publication in the ANNALS OF MATHEMATICAL STATISTICS — 
should be sent to 8. S. Wilks, Fine Hall, Princeton, New Jersey. Manuscripts © 
should be typewritten double-spaced with wide margins, and the original copy ~ 
should be submitted. Footnotes should be reduced to a minimum and whenever 7 
possible replaced by a bibliography at the end of the paper; formulae in foot- 7 
notes should be avoided. Figures, charts, and diagrams should be drawn on % 
plain white paper or tracing cloth in black India ink twice the size they are to | 
be printed. Authors are requested to keep in mind typographical difficulties 
of complicated mathematical formulae. 


Authors will ordinarily receive only galley proofs. Fifty reprints without 
covers will be furnished free. Additional reprints and covers furnished at cost. 


The subscription price for the ANNALS is $5.00 per year. Single copies $1.50. 
Back numbers are available at $5.00 per volume, or $1.50 per single issue. 


COMPOSED AND PRINTED AT THE 
WAVERLY PRESS, Inc. 
Bautrmore, Mp., U. S. A. 


Entered as second-class matter at the Post Office at Baltimore, Maryland, under the Act of March 3, 1879 








SAMPLE CRITERIA FOR TESTING EQUALITY OF MEANS, EQUALITY 
OF VARIANCES, AND EQUALITY OF COVARIANCES IN A 
NORMAL MULTIVARIATE DISTRIBUTION 


By S. S. WILKs 


Princeton University 


Summary. In this paper statistical test criteria are developed for testing 
equality of means, equality of variances and equality of covariances in a normal 
multivariate population of k variables on the basis of a sample. More spe- 
cifically, three statistical hypotheses are considered: (i) Himve, the hypothesis 
that the means are equal, the variances are equal, and the covariances are 
equal, (ii) H,., the hypothesis that variances are equal and covariances are 
equal, irrespective of the values of the means, and (iii) H,, , the hypothesis of 
equal means, assuming variances are equal and covariances are equal. 

Test criteria Dmoe , Lye , and L, are developed by the Neyman-Pearson method 
of likelihood ratios for testing Hm,., Hy». and H, respectively. The exact 
moments of each of the three test criteria when the three corresponding hypoth- 
eses are true are determined for any number k of variables and for any size, 
n, of the sample for which the distributions exist. The exact distributions of 
Limve and L,, are determined for k = 2 and k = 3, and the exact distribution of 
Lm is found for any k; these are all beta (Pearson Type I) distributions. Tables 
of 5% and 1% points of Lmee, Lye and L,, , based on Thompson’s tables of 
percentage points of the Incomplete Beta Function, are given for certain values 
of kand n (Tables Iand II). Also tables of values of approximate 5% and 1% 
points of —n In Ln», —n In L,. and —n(k—1) In L,, for large values of n are 
given (Table III), based on the fact that these three quantities are approximately 
distributed according to chi-square laws for large values of n with 4k(k + 3) —3, 
tk(k + 1) — 2,and k — 1 degrees of freedom respectively. A table (Table IV) 
is given which shows how accurate the resulting approximate 5% and 1% points 
of Lae, Le and L» are. 

The paper is written in two parts. In Part I the problem of testing the three 
hypotheses is discussed and the mathematical results are presented together 
with an illustrative example. Part II is given for the reader who wishes to study 
the mathematical derivation of the results. 


I. THe PROBLEM AND A STATEMENT OF RESULTS 


1.1. Introduction. Situations occasionally arise, in which it may be desired 
to test the hypothesis that the means are equal, the variances are equal and the 
covariances are equal in a multivariate population in which the variables are 
correlated, the test to be made on the basis of a sample from such a population. 
In the case of a normal multivariate distribution this means testing the hypo- 
thesis that the distribution is symmetric with respect to the variables. 


257 











258 Ss. S. WILKS 


As an example’ suppose three “parallel forms” of a test are constructed and 
all are given to a group of n college entrance students. On the basis of the 
scores of the n students on the three tests, how could one test the hypothesis 
that the three tests are really parallel forms, as far as means, variances and 
covariances are concerned? In other words, how could one test the hypo- 
thesis that the scores can be regarded as being from a sample of individuals 
from a college entrance population of individuals in which the distribution 
function of the three variables is such that the means of the three variables are 
all equal, the variances are equal and the covariances are equal? Actually, as 
far as practical considerations are concerned in testing work, it is frequently 
sufficient to consider only normally distributed populations. So therefore one 
may raise the question as to how to test the hypothesis that the three-variable 
sample can be considered as having come from a normal three-variable popula- 
tion which is symmetrical in the three variables, i.e. a normal population in 
which the means are equal, the variances are equal, and the covariances are 
equal. Or more generally, one may raise the analogous question for the case 
of k variables. 

Similarly, one could mention biological examples which have been treated by 
intra-class correlation methods and raise the question as to whether the under- 
lying multivariate distribution can be judged to be symmetric in the variables 
on the basis of information supplied by the sample. 

To attempt to deal with this problem by comparing means, or variances or 
covariances two at a time or performing what might appear to be extensions of 
existing tests for two or more independent samples of one variable leads to com- 
plications because of correlation among the variables in the original population. 
What is needed is some kind of a comprehensive test which will take into account 
all means, variances and covariances at one time. If it turns out that the hypoth- 
esis of equal means, equal variances and equal covariances is not supported 
by the sample, then one can raise the question as to whether the sample supports 
the hypothesis that the variances are equal and covariances are equal irrespective 
of means. If the answer is yes here, one can ask the further question as to 
whether the sample supports the hypothesis of equal means. Such tests will be 
developed in this paper for samples from a normal multivariate population. 
More specifically three tests are developed. (i) Test L,,,- for testing the hypoth- 
esis Hm», that all means are equal, all variances are equal and all covariances 
are equal, (ii) test L,. for the hypothesis H,, that all variances are equal and 
all covariances are equal, irrespective of the values of the means, and (iii) test 


1The problem treated in this paper arose from discussions with Professor Harold O. 
Gulliksen, of the Psychology Department of Princeton University, in connection with the 
problem of testing whether two or more forms of an examination can be considered as 
‘parallel forms’’. The author would like to take this opportunity to acknowledge various 
helpful discussions he has also had with his colleague Professor John W. Tukey in con- 
nection with this paper. 





- 





SAMPLE CRITERIA 259 


L» for the hypothesis H,, that the means are equal, assuming that H,, is true, 
i.e. that the variances are equal and the covariances equal. 

There are rather obvious extensions of the hypotheses Hm., Hy». and H», 
and their corresponding test criteria. For example, one could divide the vari- 
ables in the multivariate population into two sets, and consider the hypothesis 
H®. (say), analogous to Hnve , that the means are equal, the variances are equal 
and the covariances are equal within each of the two sets and that the covariances 
of variables between the two sets are all equal. Similarly, 1 © and H® could 
be defined so as to be analogous to H,,. and H,,. However, these extensions 
will not be considered in this paper. 

In Part I of this paper we shall discuss the problem of testing hypotheses 
regarding equality of means, equality of variances, and equality of covariances 
in a normal multivariate population, and summarize the mathematical results 
which have been obtained. An illustrative example will also be given. The 
derivation of the test criteria and their sampling theory is presented in Part II 
of the paper. 


1.2. The hypotheses to be tested. We assume that there is a k-variate 
population II in which the variables 2; , x2, --- , x, are distributed according to 
a normal k-variate probability density function such that the mean value of 
a; isa; (t = 1, 2,--- , k) and the variance-covariance matrix of 2, %,-°-*- , X 
is || pizoio; ||, prj being the correlation coefficient between zx; and x,;(i ¥ j), and 
o; being the standard deviation of 2; . 

In specifying the hypotheses to be considered it will be convenient to define 
three conditions on the parameters of population II: 

Condition C,,: that the means of the 2; are all equal. 

Condition C,: that the variances of the 2; are all equal. 

Condition C,: that the covariances of the x; and x; (¢ ¥ j) are all equal. 

The hypotheses regarding II to be tested are as follows: 

H mye: that conditions C,, , C, , and C, hold 

IT,.: that conditions C, and C, hold 

H,: that condition C,, holds, assuming that H,, is true. 

A precise statement of these hypotheses in terms of Neyman-Pearson likeli- 
hood ratio tern.inology will be found in Part IT. 

It should be noted that Hm». is a comprehensive hypothesis which specifies 
equality of means, equality of variances and equality of covariances and would 
be tested if one is interested in all of these quantities as a system. On the other 
hand H,, refers only to equality of variances and equality of covariances re- 
gardless of what values the means may have. H,, would be tested if one is only 
concerned with equality of variances and equality of covariances. H,, is amore 
restrictive hypothesis than either H,,.,.. or H,-, for it refers to equality of 
means under the assumption that H,,. is true. In other words, H,, can only be 
tested accurately when H,, is true; H,, would be a generalization of the Behrens- 
Fisher problem [1] when H,, is false. 














260 S. S. WILKS 


1.3. The sample test criteria. ‘he three hypotheses Hm, Hve and H» are 
to be tested on the basis of a sample O, from II consisting of the following values 
of the 2's: x4. ,%4 = 1,2, ---, ks a = 1,2, ---,n. 

The criteria for testing H,,»., H»-, and H,, depend on the following quantities 
to be determined from the sample: 


1 n 


1 
Lek (=- lia y —=- 
( ) . n =," . k i= 
: 
nm 


k 


Le LiaXLja — IX; 


l< ™ ba 
(1.2) Sj = n é a (Lia ie Ei) (Lja = ;) si 
2 1 P 
(1.3) -1> si, fr = py Z Si; . 


The sample criteria, based on the method of likelihood ratios, for testing 
Hmve, He. and H,, are respectively, as follows: 


(1.4) Keaee _ loti 
sical aetna 
(1.5) Lye = (*)F — rv) + (& — Dr) 
_ (1 — r) 


k 
— sd) +. DG 2 
¢— 1 sai 
where | s;;| is the determinant of sample variances and covariances. 

The range of values of each of the three criteria is from 0 to 1. A necessary 
and sufficient condition for each criterion to have the value 1 is that the hypoth- 
esis for which the criterion is a test be (accidentally) identically supported 
by the sample. If the hypothesis (any one of the three being considered) is 
true, the average value of the corresponding criterion will be less than 1, but 
this average value will be nearer 1 than when the hypothesis is false. 

If H mvc is true (i.e., found to be supported by the sample on the basis of the 
test Lmvc) then there will be three parameters which characterize II, namely, a 
(the common mean), o (the common variance), and p (the common correlation 
coefficient). The best estimates of these three parameters are, respectively: 


1& 
z=; 24%, 
(1.7) 3 = s+: 1» — #)’ 
k j= 


aia 2 as al 2 
r= [er wep ye? ot | / a. 


If H,. is true (i.e , found to be supported by the sample on the basis of the 
test L,.) there will be k + 2 parameters which characterize II, namely the means 





1e 
ns 


SAMPLE CRITERIA 261 


. ; . ; : 
a ,@2,°**,@,o0 (the common variance) and p (the common correlation coeffi- 
cient). ‘The best estimates of these parameters are, respectively 


a - - 2 
(1.8) Ti, te,--+, d,s ,andr. 


In order to be able to use the three sample criteria L,,,., Ly». and L,», for testing 
the hypotheses H,,,-, H»e, Hm, it is necessary to have their distribution func- 
tions under the assumptions that the respective hypotheses 5 Amv » H,,. and Hy, 
are true. 


1.4. Sampling theory of the test criteria. The moments of the exact sampling 
distributions of Lm». and L,- when Hm». and H,, are true respectively, have been 
determined for all values of k (number of variables) and all values of n (sample 
size) for which such distributions exist; ie., fork > 2andn > k. The g-th 
moments of the distributions of the two criteria are as follows: 


M,(Lmoc) = (k — 19° 1 —- — 


(1.9) 
T'(3(k — 1)(n — 1) + g(k — 1)) 
and 
M,(Lu) = (k — 19% TT T@@ = 9 + 9) 
— I P(3(n — 1)) 
(1.10) 


T(3(k — 1)(n — 1)) 


TG = 1) — 1) + gk — 0)’ 


For the cases of k = 2 and k = 3, these moments simplify so that the distribu- 
tion functions of L,,,. and L,- can be readily inferred. They turn out to be as 
follows: 


For k = 2: 
(1.11) dF (Live) = 3(0 — 2)(Lmve)' 'd Le 
T(3(n — 1))  pacn—a 3 
1.12 dF (L..) = —=————_—. Lie 1 — Ly) * dL. . 
(1.12) um ( ) 
For k = 
(1.13) dF(Lmve) = we (W/ Lie)” * (1 — WLmwe)” 4 Eve 
he 


(1.14) dF (Lye) = yy (VE) “(1 — Wy.) dW Lee. 





262 S. S. WILKS 


The distribution function of L,, when the hypothesis H,, is true has been found 
to be 


r(in(k — 1)) 
I  secepene ene rctere 

(1.15) @) r(3(n — 1)(k — 1))T Gk — 1)) 
Eper eet - L.)*?* dle 


Details of the derivation of these distribution functions will be found in Part 
II. 

In a paper published elsewhere in the present issue of the Annals of Mathe- 
matical Statistics, Tukey and Wilks [2] show how the probability integrals of 
Limve and L,, and of other statistical criteria having moments of a rather general 
class can be fitted by Incomplete Beta Functions in such a way that all moments 
of the fitted distribution agree with those of the actual distribution up to and 
including terms of order : 

It will be noted that the probability integrals of L,,,. and L,, for k = 2, those 
of ~/Lme- and ~/L,, for k = 3, and that of L,, for any value of k, are Incomplete 
Beta Functions [3], with the following values of p and q: 


| 





criterion 
a 


Remne 

Bins 
Vw 3 
V Lee | 2 
Bain | 3(n — 1)(k — 1) $(k — 1) 





Percentage points of the distributions of these criteria for the cases men- 
tioned in this table can therefore be read from Thompson’s [4] tables of per cent 
points for the Incomplete Beta Function. 5% and 1% points for L 1L 

i$ > Bet . 5%e Y% points for Lmye and Ly. 
for k = 2 and 3 are given in Table I for certain values of n. Table II shows 
5% and 1% points of L,, for certain values of n for k = 2, 3, 4, 5 and 6. 


1.5. The equivalence of L,, and an analysis of variance test for a k by n lay- 
out. One can set up a Snedecor F ratio for testing hypothesis H,, by setting 


. > 3(n — 1) — 1) — Lm) 

(«<P ; 

on i(k — 1)Lm 

and entering the F tables with m = k — land nz = (n — 1)(k — 1) degrees of 





2 The 100e% point, say Le , of a given criterion L (any of those being considered) having 
J y 


Re 
distribution dF(L) is given by I adF(L) = e«. 
0 













Zo and 19 






TABLE I 
0 points of Lmve ~~ Lee for k= 


SAMPLE CRITERIA 


























ad 
art ae ol 
3 | 
he- 4 | 
of 5 | 
ral 6 | 
nts 7 
ind 8 
9 
10 
ose 11 | 
lete 12 
13 | 
— 14 | 
15 | 
me 16 | 
17 | 
18 | 
19 | 
20 | 
ae — 
22 | 
nen- 23 |] 
cent 24 | 
| Dive 25 | 
OWS 26 | 
27 | 
28 | 
lay- 29 | 
[ting 30 
31 | 
32 | 
42 
es of 62 | 
122 | 











| .6070 
| .6307 


o7 
1% 


.0001 
.0100 


| .0464 


.6518 
.6707 


.6877 
. 7030 
.7169 


7294 | 


TALL | 


.7518 


.7616 


.7707 | 


7791 


. 7869 


7942 
.8010 


| .8074 
| .8133 
| 8190 
| .8609 


- 9050 
.9513 


8577 
.9261 


1000 | 
1585 
2154 
.2683 
.3162 
3094 
.3981 
4329 
.4642 
4924 
.5180 

5411 
5623 
.9817 
5995 
.6159 
.6310 
6450 

.6579 

.6700 
.6813 

.6918 

7017 

.7110 

.7197 
7279 
.7356 
.7943 


| 


0. 


0062 
0975 
2285 
3416 | 
4307 
5005 
5559 
. 6007 
6375 
6682 

6943 
7165 
.7358 
.7528 
7675 
.7807 
.7925 
8031 
8126 

8213 
. 8292 

8365 

8431 
8493 
8549 
.8602 
8651 

8697 | 
.8739 
.8779 
.9073 

9375 | 
.9684 
.0000 1.0000 1.0000 











2352 
.3039 

3637 
4154 
. 4601 
. 4989 
.5328 
. 9626 
. 5889 
.6124 
.6334 | 
.6522 | 
. 6693 
.6848 
.6989 | 
.7119 | 
.7237 | 
C347 
.7448 | 
7542 
. 1629 
4710 
7786 
1857 
. 7924 
7987 
.8454 
8945 
. 9460 
1.0000 


| 
| 
| 


.0002 | 
.0199 
.0808 
1588 












16 


| 33 


63 


| .1165 | 


| .2028 


| 


| 





. 1603 


| .2432 
| .2808 
.3157 | 


. 3480 


| .3778 
| .4052 


- 4306 


| .4540 


. 484 


.6660 | 


.8135 | 


1.0000 


0018 | 
.0112 | 


| 

_ 
0 00029 0.00001 0. 
£ 


0300 | 


.0559 
-0860 


| 


1181 | 


- 1508 
. 1829 
.2141 
. 2439 
.2722 


2990 
3243 | 


. 3482 


-4482 | 


5811 
7591 
1 0000 


| 
| 





.0183 | 
.0618 
.1174 
.1749 
. 2297 
. 2802 
.3209 
.3670 


.4040 | 


.4373 
.4674 
- 4946 
.9193 
.5418 
. 6293 
.7326 
.8549 
1.0000 





mx 


00064 0.00003 


-0035 
.0198 
.0493 
.0866 
. 1272 
. 1682 
. 2079 
2457 
.2811 
314] 
.3448 
3732 
.3996 
-4240 
.5230 
.6470 
-8029 
1.0000 





264 


S. S. WILKS 


TABLE II 


5% and 1% points of Lm 






























































| k=? k= k=4 k=5 
n — — —_— . eon —_—a 
| aa 1% | n | 5% | 1% n | 5% | 1% | n | 5% | 1% 
2 |0.0062/0.0002] 2 | .0500!0.0100} 2 | 0973 .0328] 2 | .1354| .0589 
3 | .0975| .0199] 3 | .2236 .1000] 3 | .2960) .1698] 3 | 3426) .2221 
4 | .2285) .080§/ 4 |) .3684) .2154]) 4 | .4372, .3002) 4 | .4793) .3566 
5 | .B416 .1588| 5 | .4729 .3162/ 5 | .5340) 4019) 5 | .5709) .4560 
6 | .4307| .2352]| 6 | .5493 6033} 6 | .6033) .4800) 6 | .6356| .5302 
7 .5005) .303¢] 7 | .6070 .4642| 7 | .6550, .5409] 7 | .6837) .5872 
8 | .5559 .3637/ 8 | .6518) .5180) 8 | .6950) .5895| 8 | .7206, .6321 
9 | .6007| .4154) 9 | .6877, .5623] 9 | .7267) .6290| 11 | .7933, .7232 
LO | .6375, .4601) 10.7169, 5995) 10.7525, 6617) 16.8559) .8043 
11 | .6682) .4989] 11 | .7411 .6310) 11 | .7739| .6892] 31 | .9246, .8961 
12 | .6943) .532§| 12 | .7616 .6579] 21 | .8788| .8290) ~ |1.0000)1.0000 
13.7165, .5626] 13 | .7791, .6813] 41 | .9372) .9101| 
l4 | .7358) .5889] 14 | .7942) .7017] = {1 .0000|1 .0000} | 
15 | .7527| .6124) 15 | .8074 .7197) | | | 
16 | .7675, .6334] 16.8190) .7356 | 
17 | .7807) .6522] 21 | .8609 .7943) |: | | 
18 | .7925| .6693) 31 | .9050, .8577/ | | 
19 | .8031| .6848] 61 | .9513| 9261] | | 
20 | .8126) .6989| 2 |1.0000)1 .0000 | | | 
21 | .8213| .7119) | | | | | 
22 | .8292| .7237| | | | | 
23 | .8365, .7347| | | | 
24 | .8431) .7448| | | | | 
25 | .8493) .7542| | | | | 
26 | .8549, .7629 | 
27 | .8602| .7710 | 
28 | .8651) .7786| | | | 
29 | .8697  .7857| | | | | 
30 | .8739| .7924 | 
31 | .8779| .7987] | | | 
41.9073, .8454| | | | 
61 | .9375, .8945] | | 
121 | .9684) .9460 | | | | | 
2 {1.00001 .0000 | | | 




















Garren www vem * 


eR EER A TEER Se 


SAMPLE CRITERIA 265 


freedom. Making use of the definition of s’, 8), rand 79 in L,, , one finds that F 
ean be written as 
S: Se 


(1.17) *-@-0D/ @-De-D 








k nm k 
' - ae al - =\3 
where S; = 1 >» (t; — £), and S = ZZ 7 (tia — Fa — Fi + ZF) and 


i=l a=] i=] 
n 
a . a aaa : : ; 
le == z Via. Thus, the use of L,, asa criterion for testing H» is equivalent 
k a=] 

to an analysis of variance test for testing “row’’ effects in a k by n rectangular 
layout when rows are associated with the k variables in the multivariate popula- 
tion and columns are associated with the n individuals in the sample. 


1.6. Approximate sampling theory of the test criteria for large samples. In 
the case of large samples, it follows from a theorem [5] concerning the distribution 
of likelihood ratio criteria for large samples that —n In La»., —nIn L,., and 
—n(k — 1) In L,, are approximately distributed according to chi-square distribu- 
tions with 3k(k + 3) — 3, 3k(k + 1) — 2, and k — 1 degrees of freedom respec- 
tively. Approximate 5% and 1% points of these three quantities taken from 
Thompson’s [6] tables of the percentage points of the chi-square distribution 
are given in Table ITI. 

Table IV is given in order to furnish some idea of how the accuracy of the 
approximations provided by Table III depend on n. It will be noted that the 
approximate values exceed the exact values in every case, differences occuring 
in the third decimal place in almost every case in which n exceeds 60. The ap- 
proximate percentages to which the approximate per cent points correspond 
are given by the numbers in the parentheses in Table IV. These numbers in 
each case were obtained by linear interpolation from the exact 5% and 1% 
points. 


e 


1.7. Comparison of L,. with Mauchly’s “sphericity” test. The criterion 
L,. for testing hypothesis H,, is, in a sense, an extension of a test developed by 
Mauchly [7] for testing the hypothesis of ‘‘sphericity’’ of a normal multivariate 
distribution. Mauchly’s test was designed for testing the hypothesis that all 
variances are equal, and that all covariances are equal to zero irrespective of the 
values of the population means. The likelihood criterion for testing this hypoth- 
esis of ‘‘sphericity”’ is 


1.18 L, = ae 

( ) (s?)* 

which should be compared with L,-. Actually, Mauchly used W/Z, as the test 
criterion, which, of course, is equivalent to using L,. The g-th moment of L, 
when the hypothesis of sphericity is true is given by 


ae ty [TG(n — i) + ®] r(dk(n — 1) 
“ , T| Td —o)) | TGkKn —1) + gh 


i=1 








266 


-_ | In a 


fork = 2, 3, 4, 5, 6. 


S. 5. WILKS 


TABLE III 
Approximate 5% and 1% points for —n In Lnve, —n In Ly, and —n(k — 1) In L,, 


—ninL, 





d.f. 5% 


1 


% d 


z 


t 
5% 


ce 


5% 





2 2 | 5.99147| 9.21034 
3 6 |12.5916 |16.8119 
4 11 |19.6751 |24.7250 
5 
6 


17 |27.5871 |33.4087 


1 





—n(k—1)InLy 


3.84146) 6.63490 


8 |15.5073 |20.0902 


d.f. 
1 
4 | 9.48773)13.2767 | 2 
3 
4 
5 


13 |22.3621 |27.6883 


24 |36.4151 |42.9798 | 19 30.1435 |36.1908 


5% 1% 


3.84146) 6.63490 


5.99147) 9.21034 
7.81473) 11.3449 
9.48773) 13 .2767 





TABLE IV 
Table indicating the accuracy of the approximate 5% and 1% points of Lmve, Lue 
and L» provided by Table III 





| | 
criterion | k | n 

-|———= | —— 
Lave | 2 30 
Lines | 2 62 
Deaws 2 122 
] Pa 3 33 
} 3 63 
| 2 30 
Lic 2 62 
Lie 2 122 
Live 3 33 
Live 3 63 
} 2 31 | 
Dn 2 61 
| 2 121 
Lin 3 31 
Lm 3 61 | 
ius 4 41 
Lin 5 31 











exact 


-9050 
.9513 
. 6660 
8135 
.8697 
.9375 
.- 9684 
.7326 
8549 
.8779 
.9375 


.9684 | 


. 9050 
.9513 
.9372 


9246 


0.8074 





5% 


approx. 





0.8190 (5.53)* 


.9079 (5.25) 


9521 (5. 
.6828 (5. 
8188 (5 
.8799 (5 
.9399 (5 
.9690 (5. 
.7501 (5 
.8602 (5. 
8835 (5. 
.9389 (5. 
.9688 (5. 
.9079 (5. 
9521 (5. 
.9385 (5. 
.9264 (5. 


13) 
79) 


.40) 
.49) 
22) 


11) 


.82) 


41) 
28) 
13) 
07) 
25) 
10) 
19) 
25) 








exact 


0.7197 


.8577 
.9261 
.O811 
7591 
7857 
.8945 
. 9460 
.6470 
. 8029 
7987 
8945 
. 9460 
.8577 


9261 | 
.9101 
.8961 | 





1% 


approx. 


0.7357 (1. 
.8619 (1 
.9273 (1. 
.6008 (1 
.7658 (1 
.8016 (1 
.8985 (1 
.9471 (1 
6688 (2 
.8100 (1 
.8073 (1 
.8969 (1 
.9467 (1 
8619 (1 
9273 (1 
9119 (1 
.8984 (1 





*The numbers in the parentheses are approximate percentages (obtained by 
interpolation) to which the approximate percent points correspond. 


which should be compared with the g-th moment of L,. . 


11.0705 |15.0863 


73)* 


-36) 


19) 


.88) 
.49) 
76) 
.37) 
20) 
01) 
55) 
43) 
20) 
.13) 
36) 
14) 
26) 
32) 


linear 


Stated in other words, 


Mauchly’s criterion L, is a test for the hypothesis that contours of equal proba- 





0 
4 


ee 7 


Gee See Vee Se ON TN OT Te eee 


ear 


‘ds, 
ba- 


SAMPLE CRITERIA 267 


bility density in the multivariate normal population distribution are spheres, 
while L,. is a test for the hypothesis that the contours of equal probability are 
k-dimensional ellipsoids with k — 1 equal axes in general shorter than the k-th 
axis which is equally inclined to the k coordinate axes of the distribution 
function. 


1.8. Illustrative Example. As an example to illustrate the use of the test 
criteria Lmve, Loe, Lm, we shall consider data on three forms of a subtest in 
verbal aptitude, and inquire as to whether the data are consistent with the 
hypothesis of the three forms being “parallel forms”’. 

A procedure’ was used for partitioning the first 60 of an entire test of 80 items 
into three sets of 20 items each by using only a “difficulty ’ and a “validity” 
index on each of the items. A random sample of 100 test booklets was selected 
from those in which the first 60 items had been attempted. Total scores were 
obtained on each of the three subtests selected in this manner. The question 
is this: Does this procedure of selecting items produce “parallel’’ subtests? 
In other words considering the three scores on the three subtests in each of the 
100 test booklets as a sample of 100 items from a trivariate normal population 
is the sample consistent with the hypothesis H,,.. of equal means, equal variances 
and equal covariances? If not, is the sample consistent with the hypothesis H,, 
of equal variances and equal covariances irrespective of means? If the answer 
to this question is no, then the failure of the tests to be parallel is at least partially 
attributable to differences in variances and/or differences in covariances. If 
the answer to the question is yes, we test H,, , the hypothesis of equal means, 
assuming equal variances and equal covariances. If the sample is not consistent 
with H,, , then the subtests fail to be parallel because of significant differences in 
means. 

If we denote the three subtests by 71, 72, 73, and the scores on the a-th 
individual in the sample on the three tests by tia, Tea, Iza Tespectively (a = 
1, 2, --- , 100), the information in the sample needed for computing L,,,., 
L,- and L, and testing Hm,., Hy». and H~», is as follows: 


# = 10.9900 s = 17.5558 
#2 = 10.9300 ss = 17.5764 
su = 16.8451 mr = .7948 
Seo = 18.1099 | si;| = 545.5308 
833 = 17.7124 
Sv = 13.5493 
$33 = 14.5826 
$23 = 13.8056 


3 Devised by Mr. L. R. Tucker of the College Entrance Examination Board. The author 
is indebted to Mr. Tucker for the data used in the illustrative example. 











268 Ss. S. WILKS 


Using formulas (1.4), (1.5), and (1.6), for k = 3, for calculating the values of 
Linve, Lye and Ly, we find 


Linve = .9209 
us = .9370 
In = .9914 


It will be seen from Table III that the 5 % point of —n In Lm. for 
k = 3 is 12.5912. Setting —100 In L,,, = 12.5912 and solving we find the 
approximate 5% point of Lm». to be .8817 which is considerably less than the 
observed value of Lm»-, namely .9209. Hence, the sample is consistent with 
Hv. AS a matter of fact the observed value .9209 lies at approximately 
the 25% point of Linve . 

In practice, there would be no point in proceeding to test H,. or Hm , because 
if Lmve is non-significant there is a high probability (not certainty) that both L,, 
and L,, will be non-significant. But for illustrative purposes, it is perhaps useful 
to consider L,. and L» anyway. 

The 5% point of —n In L,. for k = 3 is 9.48773 (See Table III). Setting 
—100 In L,. = 9.48773 and solving, we get .9095 as the approximate 5% point 
of L,-, which is considerably less than the observed value .9370, thus indicating 
that L,. is not significant at the 5% level. In fact the observed value .9370 
lies between the 25% and 10% point of L,.. 

The 5% point of —n(k — 1) nL,, for k = 3is 5.99147. Setting —200InL,, = 
5.99147 and solving we get .9704 as the approximate 5% point. Since the ob- 
served value of L,, exceeds .9704, we find L,, not significant at the 5% level. In 
fact, the observed value .9914 lies between the 50% and 25% points. 


II. DERIVATION OF RESULTS 


In this part we shall derive the criteria Lmy-, Ly. and Lm for testing Hoc , 
H,,. and H,, by the Neyman-Pearson method of likelihood ratios, and determine 
the distribution theory of the criteria. 


2.1. The test L,.,. for Hm», the hypothesis of equality of means, equality of 
variances and equality of covariances. 

2.1.1 Derivation of the criterion Ln». . Let I be a normal k-variate population, 
in which 2 , 22, --- , 2 are variables, such that a; is the mean of x; , o; the vari- 
ance of x; and pijoi0; the covariance (p;; the correlation coefficient) between 
x,andxz;. The distribution law of x , 22, ---+ , x, in the population, is 

| Ais |? . 
(2.1) ; exp | —4 Dd Asai — ai) (a; — a) | 
(27)! i,j=l 





where || A;; || is symmetric and is the inverse of the variance-covariance matrix, 
. =1 
i.e. || As |[~ = || pisoios ||, (ois = D.- 

Now suppose O, is a random sample of n individuals from population I], 


Re ee 








of 


or 
he 
he 
th 
ly 


4ve 


‘ul 


ng 
nt 
ng 
70 


wc) 
ine 
r of 
on, 


Lri- 
een 


rix, 


SAMPLE CRITERIA 269 


and let tia be the value of the x; for the ath individual in the sample. Then, 
the probability function for the entire sample (likelihood function) is 


_ | Ai; \** : n k 
(2.2) P= (Qn) i* exp| —32 z 2», Ai;(Xia = a;) (X50 _ a;) . 

The hypothesis which we wish to test is that the population means a, 
@,, *** , a are equal, the variances oj , 02, --* , 0; are all equal and the covari- 
ANCES p120102 , P130103, -** , Px—1,40%—10% are all equal, the test to be made on the 
basis of the sample of values z;.. In other words, we wish to test the hypothe- 
sis that 


Q=—-Q@a=—°-=$a=a 


2 2 2 2 
O1 P120102 *** pPikOi0K o oe +> 6p 
2 2 2 2 
P21 01 02 02 *** pen 020k po o “<* 2 
(2.3) = 
2 2 2 2 
PkiO10k Pk2020k °*°° Ox noe of -*°* #€ 


Testing the hypothesis that (2.3) holds is equivalent to testing the hypothesis 
that 


Qq—- m=: =aqa=-a4 





An Aw +++ An A B.---B 
es = | Pe 
Am An B B-:--A 
where 
ie sn, 222 =e 


o°(1 — p)(1 + (k—1)p)’ ao Ta - s+ o-oo 


To obtain the likelihood criterion L,.,. for testing the hypothesis Hm». we 
maximize the likelihood (2.2) under two conditions, for the given sample O,, 
and take the ratio of the two resulting maxima. First, we maximize (2.2) over 
the set 2 of admissible values of the parameters, i.e. with respect to all means 
a; and all variances and covariances p;;o:0;, denoting the resulting maximum 
of (2.2) by Pag. Secondly, we maximize (2.2) over the set of values w of the 
parameters which satisfy the hypothesis Hn». ; that is, we replace in (2.2) each 
mean a), d2, °*- , a by a, and each of the variances o; , 02, °:* , o¢ by o and 
each of the covariances p; joie; , (¢ ¥ 7), by po and then maximize (2.2) with re- 
spect to a, o , and p, denoting the resulting maximum by P, . 











270 Ss. S. WILKS 


Maximizing (2.2) under the first set of conditions is equivalent to maximizing 
it with respect to the a; , and the A;;, while maximizing (2.2) under the second 
set of conditions is equivalent to imposing condition (2.4) and maximizing it 
with respect to a, A and B. 

The values of the a; and A;; which maximize (2.2) under the first set of condi- 
tions are given by solving the following (k” + 3k)/2 equations. 


> 
(2.6) rail = 0, +=1,2,--- ,k 
0a; 
oP eat e : 
2. “= j=1,2,---,4, < j). 
(2.7) aAg 0, t, J ‘ (@@ <9) 


Expressions for these equations are 


k 
(2.8) | naa a a) | P=0 ’ = i 2, ee k 
j=1 
(2.9) Ek At — LY (ria — Ai)(Via — a, |p =0, 1,7, =1,2,°--,k(@ <9), 
a=l 
where A*? is the element in the ith row and jth column of |! A;;||7, ie. 
A‘ = Ppijt.0;, and i; = . >°2 ja ‘ 
a=1 
The solution of (2.8) and (2.9) is 
(2.10) a; = #;, j=1,2,---,k 


A‘ =s;;,orA;i;=8%, if =1,2,---,k, <j) 


- . eat | -1 
where s; = — + — £:)(tja — ¥;), and where || s°’ || = || si;|/>. In- 
N a=1 
serting the values of (2.10) in (2.2) and noting that the exponent in (2.2) re- 
: k 
n j a : j 
duces to — = s’s;;, which in turn reduces to —4kn, since >> s¥s;; = 1 


2 éjut i=l 
for each value of 7, we obtain 
er 


(2.11) Pg = et aie ; 


In order to obtain P.,, we specialize the a; and the matrix || A;; || in (2.2) 
in accordance with (2.4), noting that the determinant | A;;| reduces to 
(A — B)/(A + (k — 1)B), thus obtaining the following specialized form 
of (2.2) 


((A — B)"(A + (k — 1)B)]” 


(2.12) P! = Sars 


exp {4 [ALY Gu — oF +BY Yo ee — Olt — 0) f. 


a==l i=l asl i4j=1 





——_ 


SAMPLE CRITERIA 271 


The values of a, A and B which maximize P’ are given by solving the following 
three equations 


oP’ oP’ oP’ 
(2.13) = 0, _* 0, — =0. 


These equations are respectively 


g — BSD aa) + BY (= a «) | P’=0 


3n(k — 1) 5n 1 < = git boy 
(2.14) ES + ape 7 AY a |p =o 
—}n(k — 1) in(K-1) #3=3x£ - — eae! is 
| A-B *A+Q-—1B x oe sain atte |p on 


Replacing a; by %; in (2.15) putting = > (tia — E)(2ja — =;) = s;, and 
a=l 
setting 


a 


wd 


nk anak jal 
l< ps ‘ nui ” 
es 2» (tia — £)(%ja — Z) = 8:5 + (%; — %)(%; — FZ) 
k k 
(2.15) ¢ 1m = 2 S0i;/(k — 1) » Soii 


ifj=l 
k ifé k 
& = x Sii/k = Le i a (i — a | 


we obtain as solutions of (2.14) 


[2 - Dea] /e-v[ Yar Da - a] 


a=¢f 


1 + (k — 2)r% 


(2.16) A= 30 —n0 + &— Dn) 


—To 


~ sk(1 — m)(1 + (k — 1)r0) ” 
Substituting these in (2.12) we obtain 


B 


ea hkn 


a) Pe= (FC WA +d en 











272 Ss. S. WILKS 


The likelihood ratio X,,.. for testing hypothesis /1/,,,. is given by 


P. 


Amvc = a. 
Po 


9 
It will be convenient to use the - th root of \,,.»-as the test criterion for Hp». . 
n 


Denoting this criterion by Ln». , we have 





2.18 Linve = icine ecssalh A Ia risencriiere 
— mee = (GEL — re) (1 + (F — 1m)” 
The use of Lm». as a test criterion is obviously equivalent to the use of Amc. 

It will be seen that L,,.»- is equal to unity when and only when the sample 
means 7; are all equal, the sample variances s;; are all equal and when the 
sample covariances s;;, (¢ # 7), are all equal. The greater the departure of 
sample means from equality, sample variances from equality and sample co- 
variances from equality, the smaller will be the value of L,,,. , its value, of course, 
always remaining between 0 and 1. 

2.1.2. Approximate distribution of —n In Lin». in large samples. In order to 
make use of Lm». as a criterion for testing hypothesis H,,,. we must find its 
sampling distribution under the assumption that H,,,- is true, i.e. that our sample 
has, in fact, been drawn from a k-variate normal population having equal means, 
equal variances and equal covariances. In the case of large samples, it follows 
from a theorem on asymptotic distributions of likelihood ratios [5] that —2In\,,»- 
(i.e. —n In Lave) is approximately distributed according to the chi-square law 
with 4k(k + 3) — 3 degrees of freedom (obtained by taking the difference be- 
tween the number of parameters used in maximizing P to obtain Pg and that 
used in maximizing P’ to obtain P,). 

Thus, to apply the test, one computes the value of —n In L,,,. for the given 
sample, and sees whether the obtained value is significant at the given probability 
level (5% or 1%) using the chi-square table for 34(k + 3) — 3 degrees of freedom. 

To make a study of how closely the chi-square distribution approximates the 
exact distribution of —n In L,,,, for various values of k and n would be an ard- 
uous task in computation. But existing experience with approximations to large 
sample distributions indicates that the approximation in the present problem 
would be satisfactory for small values of /& (say not more than 5) and values 
of n not less than 50. Some light is thrown on this question for k = 2 and 3 
by Table IV. 

2.1.3. Moments of the exact distribution of Linve. In Section 2.1.2 an approxi- 


mation is given to the distribution of —n In Lm,- for large samples. Asa matter 


of fact, one can find expressions for the moments of the exact distribution of 
Linve, Which for the cases of k = 2 and k = 3 yield simple expressions for the 
exact distribution of Linve . 


NN 


See 


> REE 





SAMPLE CRITERIA 273 
To find the moments of Ln, it will be noted that if one sets 
N8ij = Aj; 
NSo.5 = Api; 


in expression (2.18) for L,..-, the following expression is obtained for Live. 





| as; | 
(2.19) Lave — fF. So 
where 
ig k 
~ > mm aT oes - 
(2.20) 1/2 ‘ 
So = 1 (X ow + >» tus). 
jan ixjml 


It will be seen that Ln»-depends on the ;and the a;;. In the case of a sample 
from a general normal multivariate population, we know the a;; to be distributed 
according to the Wishart [8] distribution function 


k 
A. ED nee) 
| Ai; | |a;; |" ‘exp | -3 2X Asa 
t7= 


(2.21) Wa-1e(ai;; Acs) = k 
gir) RED TT rain — 4) 
i=1 


and the means Z; to be independently distributed according to the normal dis- 
tribution 


. n* | Ai; |} ny ‘ . 
(2.22) f(z) = ant exp 9 x A;;(Z; — @;)(%; — @;) 


where the A;; and a; were defined in (2.1). 
r . Ro+vs 
We now define a function g(g, u, v) as the mean value of | a;; |% e“"°*’’? when 
Hmye is true, 1e, 
) | g uRotvs 
(2.23) e(g, u,v) = E(| aj; "ee *) 


where the right hand side denotes multiplication of (2.21) by (2.22) (after im- 
posing condition (2.4)) by | a;; |’ e““°*’** and then integration with respect to 
the a;; and Z;. This yields 


\ — oot TT| GM — 1) +9) 
en es | TG@ — 9) | 
(2.24) th ~ Byi"* (4 +a~- 1)B)k- 


x Qu $(k—1) (n+g) : 
(4 wie m4) (A + (k — 1)B — 20)Ke-40 














274 S. S. WILKS 


Now the gth moment M,(Linv-) of Lmve is defined by 
(2.25) My(Lmve = E[(Lmve)"] 


and is obtained by evaluating the partial derivative 


gre te 
(2.26) dur®D ope (¢) 
at wu = v = 0, and then putting r = —g ands = —g. The validity of this 


operation for the range of values of g in which we are interested can be estab- 
lished by an argument based on analytic continuation. Alternatively, the same 
result can be achieved by taking the indefinite integral of g r(k — 1) times suc- 
cessively with respect to n, and s times successively with respect to v (the lower 
limit of integration being — © in every case) and then evaluating the final 
result at u = v = 0. Accordingly, we obtain for the gth moment of La,., 
when hypothesis H,,,- is true, the following expression 


_ wri rG — 1) +9) 
Mg(Lmvc) iia I] | ae oT 


x (k st, ye 


i=1 


(2.27) 
Tr(Z(nm — 1) (gn(k — 1)) 


T(3(m — 1) + g)Pnlk — 1) + gk — 1)" 
2.1.4. Distribution of Lime for k = 2 and 3. For k = 2, the criterion Ln,, 
can be expressed as 








Si S12 
(2.28) Lmve = ee 
| 2(su + 8») + #(%1 — 2)” Se — 3(%1 — &)° 
| 8a — 2(%1 — 2)" $(su + S22) + 3(Z1 — 2)” 


The gth moment of Lm». for k = 2 (obtained by putting k = 2 in (2.26) is 


: _ Taner — 2) +9) _  (3(n — 2)) 
(2.29) M g(Lmvc) — r(an + gl Gm — 2)) = Gn _ 2) + 9) ’ 


and the distribution function of L,,,. is found to be 


(2.30) GF (Lec) = }(m — 2)LE2™ dLaee, (0 < Lmve < 1). 


For k = 3, Lmo- can be written as 
Si S12 S13 | 


(2 31 ) S21 S22 «= S23 
S31 $32 $33 
| 


~ (s)(1 — 1) + 2m) 





this 


me 
uc- 
wer 
nal 


RVE ys 


4mve 


) is 


SAMPLE CRITERIA 275 


where so and 7 are defined in (2.15) for k = 3. Putting k = 3 in (2.26) we 
find the gth moment of L,,,.. for this case to be 


: — go T(2(n — 2) + —T(Q(n — 3) + g/m) 
282) ED) STE = WE = BT 20) 


By using the fact that 


rit + r(¢+1) = es», 


it is seen that M,(L.,,-) reduces to 


T'(n)T(n — 3 + 2g) 
T(n + 2g)T(n — 3)’ 


from which we deduce the distribution of L,,.,- to be 


___T(n) a 
(2.34) a Lmee) = BEG — 8) (WLmve)” “(1 — WLmve) €V Lee » 


(2.33) M,(Lave) = 


(0 S Lmve < 1). 


For values of k > 3, the exact distribution of L,,.. seems to be too complicated 
to lend itself to ready computation. 

Thus, relatively simple exact tests of significance of Lm». can be set up for 
k = 2and k = 3 by using distribution functions (2.30) and (2.34) respectively. 
For large values of n we have pointed out that the significance of Lm». can be 
tested by making use of the fact that —n In Lm, is approximately distributed 
according to a chi-square law with $k(k + 3) — 3 degrees of freedom when H,,,- 
is true. 

For k = 2, Lmve is essentially a criterion for simultaneously testing, on the 
basis of a sample, the hypothesis of equality of means and equality of variances 
of a normal bivariate population. 

It should be noted that if H,,,. is true, or more realistically, is supported by 
the sample as a result of applying test Lm». , then population II is characterized 
by the three parameters a, o and p in (2.3). The likelihood estimates of these 
parameters are &, st and 7%. 


2.2. The test L,. for H,., the hypothesis of equality of variances and equal- 
ity of covariances, irrespective of the values of the means. 

2.2.1. Derivation of the criterion L,.. If, in testing hypothesis H,,,. by means 
of the criterion Ly. , at a given level of significance, say ¢, a non-significant value 
of Live is obtained, one states that the sample is consistent with the hypothesis 
H..»- that all the population means are equal, the variances are equal and the 
covariances are equal. Consideration of the Neyman-Pearson Type II error 
involved in this statement would be very arduous and involved and will not be 
attempted. On the other hand, if a significant value of Ln». is obtained, one 


















276 S. S. WILKS 






states that the sample contradicts the hypothesis H,,,, with probability € of 
making a Neyman-Pearson Type I error. In this case it may be reasonable to 
inquire whether the sample would support the hypothesis if the variability 
due to the means were eliminated. In other words, we may inquire whether the 
sample supports the hypothesis H,. of equal variances and equal covariances, 
irrespective of what values the population means may have. To obtain the 
likelihood ratio criterion L,. for testing H,. we maximize the likelihood (2.2) 
under the following two sets of conditions: First, with respect to the means a; 
and the variances and covariances p;j;o0:0; ; and Secondly, with respect to the 
means a; and A and B, where A and B are obtained by imposing the condition 
on the matrix || A;;|| specified in (2.14). The maximum of (2.2) under the first 
condition is given by (2.11). Denoting the maximum of (2.2) under the second 
set of conditions by P.. , it is found, by a procedure similar to that used in finding 
P., (given by (2.17), that P. is given by 


* er 
(2.35) Pu = (yd 0 +e Dn 
where 
r= + 8ij o~ Hts 
(2.36) ~ = 
s = 2 au / ke 


The likelihood ratio \,. for testing H,, is given by 


es | | 85 | 
(#)F(L — r)F4(1 + (Kk — Ir) 


The test criterion which will be used for testing H,. is Ly, the “th root of 


Noe 12., 


" | sii | 
— te = GR = NR + = Dr)’ 
2.2.2. Approximate distribution of —n In L,. in large samples. 
In the case of large samples —n In L,, is approximately distributed according 
to the chi-square law with 3k(k + 1) — 2 degrees of freedom when hypothesis Hy. 
is true. : 
2.2.3. Moments of the exact distribution of L,.. The moments of L,. when 
H,, is true can be found by a method similar to that used in Section 2.1.3 for 
determining the moments of Lm». . For it will be noted that L,. can be written as 


(2.38) Lye = E a5 | | 





RS 






































SAMPLE CRITERIA 


where 


i< 1 - 
R= k 2 ai - k(k — 1) a D m, aij 
(2.39) ia : 
S = 1] Dae + = a), 
inl ixj=l 


from which it is evident that L,. depends only on the a;;, whose distribution in 
the case of a general normal multivariate population is given by (2.21). We 
now define a function 6(g, y, z) as the mean value of | a;; |’e’**** under the as- 
sumption that H,, is true, i.e., 


(2.40) 4(9, y, z) = E(| a;;|’e"***) 


where the value of the right hand side is obtained by multiplying (2.21) by 
| a;, |"e"****, then imposing the condition on || A;; || stated in (2.4) and integrat- 
ing with respect to the a;;. Accordingly, we find 


6(g, y, z) = ge Il | ae = +0] 


i=1 T'3(n — 2) 
(2.41) (A te Bye VOY (A 4 (k coal 1)B)**-» 
x 2 4(k—.) (n—1+29) ° 
(4 —B- gh) (A+ (k— 1)B— 22)" 
The gth moment M,(L,.) of Ly. is obtained by evaluating the partial derivative 
ge Ute 
(2.42) dy" *—) zt 6 
at y = z = O, and then setting, r = —gands = —g. These operations yield 
7 [TG@— 4) +49) 
Myla) = [h| EO ee | 
(2.43) 


ran — 1I))TGk — Im — )) 
P(3(n — 1) + g)P(3(K — 1) — 1) + g(K— 1)) 


2.2.4. Distribution of L.. fork = 2and3. For k = 2, L,, can be expressed 
as follows: 


4 (k _— ye 





Si S12 





S21 $22 
2.44 Ly. = ——- ST 
( ) 3 (81 + S22) S12 


$21 $(su + S22) 














278 S. S. WILKS 





and the gth moment of L,, is given by 








(2.45) M(L..) = 1G — DFG@ — 2) + 9) 


P(3(n — 1) + g)TG(m — 2)) 


from which the distribution of L,. is deduced to be 
r(3(n — 


' : PG 1) an—ay 


For k = 3, L,- can be expressed as 



















Q-LJ°a&.., @5 Le <2. 





| Su Si2 $13 








| (2.47) - S22 S93 
A | $31 $32_——«833 
! ** (s*)8(1 — r)2(1 + 2r) 


: where s and r are defined in (2.36) by setting k = 3. Setting k = 3 in (2.43), 
¢ we find as the gth moment of L,. 





ZZ atte, sv 2 =e ’ ia 
oa) Mi. = fe 3 tee - 2 tee 

r(z(n — 2))P (mn — 3))P(n — 1 + 2g) 
Following the method by which (2.32) was reduced to (2.33), we find that the gth 
moment of L,, reduces to 




























_ T(r — 1r(r — 3 + 2g) 
~ Tin — 1 + 29)T(n — 3)’ 


and hence the distribution function of L,,. for k = 3 is 
I'(n 
T(n 


(2.49) M, (Loc) 





(2.50) dF (Le) = = (WTn)" *(1 — VEn) dVEm, (0S Lee <1). 

For higher values of k the distribution of L,, is apparently too complicated for 
ready computation. But distributions (2.46) and (2.50) provide relatively 
simple significance tests for the cases k = 2 and 3, respectively. For large sam- 
ples, we remark again that a significance test for L,-, is provided by the fact 
—2 In X,. (ie., —n In L,-) is approximately distributed according to the chi- 
square law with 3k(k + 1) — 2 degrees of freedom when H,, is true. 

For k = 2, L,- is essentially a ‘criterion for testing, on the basis of a sample, 
the hypothesis of equality of variances of a normal bivariate population. 

If H,- is true, II will be characterized by the parameters a, a2, °°: , Gk, o 
and p. The maximum likelihood estimates of these parameters are Z;, 72, °°: , 
i, , 8 and r, respectively. 
2.3. The test L,, for H,,, the hypothesis of equality of means, when the 
variances are equal and covariances are equal. 







SAMPLE CRITERIA 279 


2.3.1. Devivation of the criterion Ly. Suppose L,, , described in Section 2.2.1 
for testing H,., the hypothesis of equal variances and equal covariances, does 
not have a significantly small value, thus indicating that the sample does not 
contradict the hypothesis H,.. Then, assuming that the original test Lino. 
of Hm». turned out to have a significantly small value, we may inquire as to 
whether the significance of Lm». is due to the inequality of the population means 
a;. In this section we shall consider a criterion L,, for testing the hypothesis 
H,, that the means a; are equal, assuming that the variances are equal and that 
the covariances are equal. In this hypothesis we maximize the likelihood (2.2) 
under the following two sets of conditions: First, with respect to the a; , A and B, 
where A and B are defined by the condition on || A;; || given in (2.4); secondly, 
with respect to a, A and B where these parameters are specified by (2.4). The 
maxima of the likelihood (2.2) under these two conditions are P,,, and P,, 
given by (2.35) and (2.17) respectively. The likelihood ratio \,, is therefore 


P. 2\key py k-1 on in 
(2.51) a = = = | Se at lt Ai ale. ve | 

Por (so)"(1 — ro)” (1 + (& — 1)r0) 
Now it follows from the definitions of s’, , s3 and ro, (2.15) and (2.36) that 

s(1 + (k — I)r) = s(1 + (k — 1)ro) 
and hence we may write 

2 k-1 
ie #(1 — 7) 
(2.52) An == io = <q ; 
We can also express );/” as 
R k-1 

2.53 nv = (7) 
(2.58) i 


where Ry and R are defined by (2.20) and (2.39) respectively. 
It will be most convenient for our purposes to use L», , the [2/n(k — 1)]-th 
root of Am , as the criterion for testing H», , i.e. 


s(1 — r) #(1 — r) 


Lm = R/Ro = 3 - iis aa 
8o(1 To) s(1 a r) of 7 > (z; - #) 
_— Sand 


(2.54) 


2.3.2. Approximate distribution of —n(k — 1) In Lm in large samples. 

In large samples —2 In A», (i.e., —n(k — 1) In L,,) is approximately distributed 
according to the chi-square law with k — 1 degrees of freedom. However, 
the exact distribution of L,, is relatively simple and will be derived. 

2.3.3. Exact distribution of Lm when H,» is true. We shall determine the dis- 
tribution of L,, by first finding the gth moment of L,, when H,,istrue. For this 
purpose we set up the function 


(2.55) W(p, q) = E(e?***"*) 








280 S. S. WILKS 


where the mean value is taken when /, is true, i.e., when the a; and || A;; || 
satisfactory conditions (2.4). Now R and R, are functions of the a;; and 2; . 
Hence, to find E(e?”****) we multiply (2.21) by (2.22) by e?”**"* and impose 
conditions (2.4), then take the integral over the entire space of the a;; and 2; . 
These operations yield 








(A =_ ers 





2 v(p, q) 2 Seen ee ee 5 

2.56 i(n—N RD) ie) 
k-1 k-1 

The gth moment of L,, is obtained by performing the following differentiations 
a” fa’y 

om SL 

and then putting h = —g. These operations yield 

lin — “ v4 si 
(2.58) Na.) Se eee 













C(z(n — 1)(k — 1))T(gn(k — 1) +g) 
from which the distribution of L» (when A, is true) is found to be 


ed.) « ———. TG@nk-1)) eer os 
(2.59) r(z(m — 1)(k — 1))P (kK — 1) 


gs... O< lk. < D. 


ow sv 66-65. 8s 


‘Thus, we are able to make an exact test of significance of L, on the basis of 
the function (2.59) 






2.4. Relations between L,,,., L,. and Ly. 
It will be seen from the definitions of Lm,., Lye and Lim in (2.18), (2.37) and 
(2.54) (noting that s°(1 + (k — Lr) = s3(1 + (k — 1)ro)) that 


k—1 
_ = behing . 


Furthermore, it will be noted that when H,,,. is true, the gth moment of Lm. 
given by (2.27) is equal to the product of the gth moment of L,. given by (2.43) 
and the gth moment of Lk’ (obtained by replacing g by g(k — 1) in (2.58). 
Thus, when Hyy_- is true Amye is composed of the product of two independently 
distributed quantities, namely L,, and Lir’. 













REFERENCES 


[1] H. ScHerré, ‘‘A Note on the Behrens-Fisher Problem,’’ Annals. of Math. Stat., Vol. 15 
(1944), pp. 430-432. 

[2] Joun W. Tukey anv 8S. S. WILKs, ‘‘Approximation of the distribution of the product of 
beta variables by a single beta variable,’’ Annals of Math. Stat., Vol. 17 (1946), 
pp. 318-324. 

(3] K. Pearson, Tables of the Incomplete Beta Function, Cambridge University Press, 1932. 








SAMPLE CRITERIA 281 


[4] CATHERINE M. THompson, “‘Table of percentage points of the Incomplete Beta Func- 
tion, Biometrika, Vol. 32, Part III (1941), pp. 151-181. 

[5] S. S. Wirxs, ‘‘The large-sample distribution of the likelihood ratio for testing com- 
posite hypotheses,’’ Annals of Math. Stat., Vol. 9 (1938), pp. 60-62. 

[6] CATHERINE M. THompson, ‘Table of percentage points of the x? distribution,’ Bio- 
metrika, Vol. 32, Part II (1941), pp. 187-191. 

[7] Joun W. Mavcaty, “Significance test for sphericity of a normal multivariate dis- 
tribution,’’ Annals of Math. Stat., Vol. 11 (1940), pp. 204-209. 

[8] J. WisHart, ‘The generalized product moment distribution in samples from a normal 
multivariate population,’’ Biometrika, Vol. 20A, pp. 32-52. 












CONTRIBUTIONS TO THE THEORY OF SEQUENTIAL 
ANALYSIS, II, III 


By M. A. GrrsHick 





United States Department of Agriculture 


Summary. This is a continuation of a paper Part I of which was published in 
the June, 1946 issue of the Annals of Mathematical Statistics. The present paper 
is divided into two parts, Parts II and III, which are summarized as follows: 















Part II. The Exact Power Curve and the Distribution of n for Sequential Tests 
Where z Takes on a Finite Number of Integral Values. 


nm 
Consider a sequential test defined by a decision function Z, = z, Za With 


a=1 
boundaries —b and a where a and b are positive integers and z, is the ath ob- 
servation of a variate z which takes on a finite number of integral values ranging 
from the negative integer —r to the positive integer m with respective probabili- 
ties p_,,--:, Dm. Leta: = P[Z, = (a + 2)], (¢ = 1,2, ---,m — 1), and &; = 


P{Z, = —(b+))], G = 1,2,---,r—1). Furthermore, let A be a square matrix 
of a + b — 1 rows and columns with elements defined by: a;; = 1 — pp for all 7; 
Q:i4% = —pPxfork = 1,2, ---,m;aii-; = —p_;forj = 1,2, ---,r;anda;; = 0 


otherwise. 
It is proved that 









r—j—1 


(i) &; = Z Re eAhe-4-48 5 (jg = 0,1,---,r-—1) 


1=0 













m—j—1 


(ii) ba; a p> Pi+ j+146+b-1-1,b ; (7 = 0, 1, oc; a a). 


where A, is the element of the kth row and bth column in A™’. Let Eq;7” 
and E,;7" be the conditional generating function of n under the restriction that 
Z, = (a+ j) and Z, = —(b + J) respectively. Then &j;£,;7" is obtained by 
substituting rp; for each p; occurring in equation (i) and &,;£,;7” is obtained by 
substituting rp; for each p; occurring in equation (ii). The probability that 
Z, = a+ 7 in exactly n steps is given by the coefficient of r” in the expansion of 
£,;£,;T" in a power series in tr. The probability that Z, = —(b + 7) in exactly 
n steps is similarly obtained. 

This method is applied to the derivation of the exact power function and the 
distribution of n for the sequential binomial probability ratio test. 


Part III. On Conjugate Distributions. 


Consider a random variable X with a distribution density f(x, @) which satis- 
fies certain specified conditions. Let 6, and 6 be two values of @ and let z = 
log (f(x, 62)/ f(x, 6:)). For any hypothesis 6 = 6’, let o(t| 6’) be the moment 
282 


SEQUENTIAL ANALYSIS 283 


generating function of z and h the non-zero value of ¢ for which g(t | 6’) = 1. 
We set F(x) =e'f(x, 6’). Then f and F are conjugate distributions. If 
F = f(x, 6’), then 6’ and @” are defined as conjugate pairs. 

A method is given for obtaining the totality of conjugate pairs for the general 
class of distributions which admit a sufficient statistic. It is then shown that 
the power of the sequential probability ratio test based on such distributions is 
given explicitly in terms of these pairs. It is proven that within the approxima- 
tion obtained by neglecting the excess of | Z, | over a and b at a decision point 
the following relationship holds: 


Pi(n|F) = e™Py(n lf) 
P.(n|F) = eP,(n|f) 


where P;(n | g) and P,(n | g) stand for the probability that Z, > aand Z, < —b 
respectively in exactly n steps under the hypothesis g. 


Il. THe Fxact Powrer CURVE AND THE DISTRIBUTION OF 72 FOR SEQUENTIAL 
Tests WHERE z TAKES ON A FINITE NUMBER OF INTEGRAL VALUES 

2.1. General discussions. Let a sequential test be defined by a decision func- 

tion Z, = = Za With boundaries —b and a where a and b are positive and z, 
o=l 

is the ath observation of a variate z which takes on a finite number of integral 

values, —7,r + 1, ---, —1,0,1,2, ---,m. Let P(z = 1) = p; where P(z = 2) 

stands for the probability that z takes on the value 7. We shall assume without 

any loss of generality that a and b are integers. 

When the sequential test terminates with Z, > a, the possible values that Z,, 
can take on are: a,a+1, ---,a-+ m — 1. Similarly, when the sequential 
test terminates with Z, < —b, the possible values which Z, can take on are: 
—b, —(b+ 1), ---, -(b+r—1). Letiai = P[Z, = (a+ ],7 = 0,1, ---, 
m — land&; = P[Z, = —(b+ 2],7 =0,1,---,r-—1. 

For any variate u, let the symbol E;;(u) stand for the expected value of u 
under the restriction that Z, = —(b + 7), and the symbol £,;(u) stand for the 
expected value of u under the restriction that Z, = a+ 7%. Let $(¢) be the gen- 
erating function of z. Then 


(2.101) ¢(t) = 2 pit’. 


In terms of the generating function, the Fundamental Identity (see section 
2.32 in [6]) ean be written as 


r—1 ‘ion? 
(2.102) my tit Exile(t)) ” + dX tail’ ‘Eaile())” = 1. 
t=0 jul 


It follows from (2.102) that for all values of ¢ for which 


(2.103) g(t) = > pit’ = 1, 


i=-f 








284 M. A. GIRSHICK 


r—1 m—1 
— Wi) =D eit? + Do bat = 1 
— i=0 


where y(t) is the generating function of Z,, . 

In the paper “The cumulative sums of random variables” [2] Wald has given 
the following method for obtaining the probabilities &; and & +. Let 4, t, 

+, trim be the r + m roots of (2.103). Substituting these in (2.104) we get 
r + m linear equations in the r + m unknowns, £; and &;. Thus, if the deter- 
minant of these equations is different from zero, the unknowns can be solved 
in terms of the roots of (2.103). Ina similar manner, the characteristic function 
of n under the restriction that Z, = 7 can also be obtained. 

The above method has ‘two disadvantages. First, it involves solving for 
all the roots of a polynomial which will often be of a high degree and second, it 
involves solving a set of linear equations with coefficients which are powers of 
complex numbers. 

The method outlined below is in many respects much simpler. It requires 
> only the evaluation of one column of the inverse of a matrix of a + b — 1 rows 
and columns. The elements of the matrix are given explicitly and are either 
0, 1 or p;. This permits obtaining general solutions for special classes of 
sequential tests. 

2.2. Derivation of the exact power functions. We multiply ¢(f) — 1 by ?¢’ 
and y(t) — 1 by ¢’*”” and obtain two polynomials. 















m+r 
(2.201) f®) = 2 (pir — Sint? 
yn 
and 
r—l a | . 
(2.202) g(t) = 7 gf 3 a ft», z get 
7=0 & 


where 6;, = 1 when 7 = k and zero otherwise. 

By the Fundamental Identity, every root of f(é) is also a root of g(t). Since 
f(t) is of degree m + rand g(t) is of degree a + b + m+ r — 2, it must follow 
that g(é) equals f(é) times a polynomial of degree a + b — 2. That is, 





a+b—2 


(2.203) g(t) = f(d) 2. et" 


where the c’s are undetermined constants. Substituting from (2.201) in (2.203) 
we obtain 






a+b+m+r—2 


(2.204) g(t) = 2 Q; 










1 It is assumed here that f(t) has no multiple roots. The author conjectures that this is 
true for the polynomial under consideration for all values of p. 

















SEQUENTIAL ANALYSIS 


where 


(2.205) Q; = Siw > dean. 


Comparing the coefficients of (2.204) with those of (2.202) and taking into 
account the fact that p, = 0 when k > mand cq = 0 whenk > a+ b — 2, we get 


r—j—1 


(2.206) i; = } Pir Cr—j—i-1 , G = 0,1,---,r—1), 


= 


and 
m—j—1 
(2.207) £4; = a Pi+ j+1 Ca+b-—i-2 , Gj _ 0, 1, 4 1). 
Thus, if the c’s (we require only the first r and the last m) are determined, the 
probabilities &; and &; are also determined from (2.206) and (2.207). But, if 
we examine the structure of g(¢) in (2.202) we see that the coefficients of all the 
powers of ¢ from r to (a+ b+ r — 2) inclusive are zero except for the co- 
efticient of &*’’ which is equal to —1. Consequently, if in (2.204) we set 
Q; = —db;a41, for ally = r, r+ 1, ---,a+b+ 7 — 2, we shall have the 
required number of equations to solve for the a + b — 1 unknown c’s. 
In view of (2.205) these equations can be written as 


7 
(2.208) a (Sie — Dir); = Sjatr1, G=r,---,atb+r— 2). 


Changing the range of the subscript 7, we get 
j+r—1 

(2.209) 2 (dir = Di-r)Cj+r—-i-1 — 5p , G _ i, 2, ‘++ > _- 1), 
with the understanding that p, = O whenk > mand c, = Owhenk > a+ b — 2. 

Let A be the matrix of the equations in (2.209). Then A is of the following 
form. The elements in the main diagonal are (1 — po). In the diagonals to 
the right of and parallel to the main diagonal, the elements are —p_;, —p_2, ---, 
—p-_,, 0, ---, 0 successively; in the diagonals to the left of and parallel 
to the main diagonal, the elements are —p,, —pe, ---, —DPm, 0, ---, 0 suc- 
cessively. Assume that the determinant of A is different from zero’ and let 
A™ be the inverse of A. Let the elements of A be designated by A;;, G,j= 
1,2,---,a+b-—1). Then, in view of (2.209) we get 


(2.210) 3 A 4105 QG — 0, 1, 2, =, 8 +b-— 2). 
Finally, from (2.206) and (2.207), we have, 


r—j—1 


(2.211) i; = Zz Di-r Ay—j-i,d 5 (j _ 0, 1, 2, ae 1), 


t=0 


2 P. L. Hsu has submitted a simple proof to the author that A is non-singular. 








4 te NS 





286 M. A. GIRSHICK 


and 
m—j—1 


(2.212) taj = 2 Di+i+1 Aatt—i-1 5 (7 = 0,1, 2, ---,m —1) 
where, as before, it is understood that p, = 0 when k > mand Ay = 0 when, 
k>a+ob-—1. 

From (2.211) and (2.212) we can obtain the probability that Z, < —b and 
the probability that Z, > a since these are given by 


r—1 m—1 r—1 
2d &; and dX aj; (- 1 — > &s) 
= = j= 


respectively. We can also obtain En, the average number of steps required 
to reach a decision. For, if we differentiate (2.102) with respect to ¢ and 
set ¢ = 1, we get 


t=0 
Ez OTe Cee m 
Do tpi 


i=——r 


Ey, 2 ula tt) — D dilb + 7) 
(2.213) E(n) = —- = ___. 














2.3. Derivation of the probability that the sequential test will terminate in 
exactly n steps. Let ¢(t) be the generating function of z and y(t, 7) the joint 
generating function of Z, and n. Then 


(2.301) o(t) = 2D pt’ 

and 
r—1 m—1 

(2.302) W(t, 7) = Do bl OO Buse”® + Do bat’ Bait”. 
i=0 i=0 


Furthermore, let ¢:(¢, 7) = 7o(t) — 1 and y(t, 7) = W(t, 7) — 1. In terms of 
these functions, the Fundamental Identity can be stated as follows: For a fixed 
T, every root of ¢:(t, 7) is also a root of W(t, 7). Let f(t, r) = ti(t, 7) and 
g(t, 7) = t*' W(t, 7). Then 


m+r 
(2.303) ft, T) = 2» (rp;-+ ol birt 
and 
r—l wns 
(2.304) g(t, r) = Dy (bo; Boje" — OP OD (bai Bas). 
7=0 jad 


Since for a fixed 7, every root of f(t, 7) is a root of g(t, 7), and since f(t, 7) 
is a polynomial in ¢ of degree m + r and g(t, 7) is a polynomial in ¢ of degree 
a+b+m-— 2, it must follow that* 





3 See footnote 1, section 2.2. 


SEQUENTIAL ANALYSIS 


a+hb—2 


(2.305) g(t, t) = fi, 7) X d;t’. 


The rest of the argument is identical with that of section 2.2 except that the 
unknowns in this case are &;£,;7”" and £4 ;£,;r” and are given by 
r—j—1 


(2.306) fo Exit” = Do rpi-rdr—i-i-1, (j = 0,1,---,r — 0), 


and 
m—j—1 
(2.307) £4; Eq;T" — Zz TDi+j+1da+b-i-2 (j = 0, 1, ae eee 1), 


7=0 


(see (2.206), and (2.207)) where the d’s are obtained by solving the linear equa- 
tions: 
itr—l1 

(2.308) 2) (ir — Pa)direr = 6p, (f= 1,2,---,a+b-D), 
(see (2.209)). Thus, we see that the solution for &:H,:7" is obtainable from 
the solution given in 2.2 for &; by substituting rp; for every p; appearing in the 
expression (2.211). Similarly, the solution for £:H.,7" is obtainable from the 
solution given for £; by substituting rp; for every p; appearing in the expression 
(2.212). 

Let p(Z, = k | n) stand for the probability that Z, = k in exactly n steps and 
let pai(n) = p[Z, = (a + 2) | nj] and poi(n) = p[Z, = —(b+7%)|n]. Then 
pai(n) and po:(n) are given by the coefficient of 7” in the expansion of £,;Ea:7” 
and £,H,;7”" respectively in a power series in tr. That the expansions are valid 
can be seen from the following considerations: If we examine the solutions given 
for taiHait” (¢ = 0,1, ---, m — 1), and &:Eyit” (¢ = 0, 1, ---, r — 1), we see 
that each is a ratio of two polynomials in 7, the polynomial in the denominator 
is, in each case, the determinant of the linear equations (2.308). Now, it is easy 
to see that this determinant eqals 1 when 7 = 0. Hence the expansions are 
valid in a neighborhood of + = 0.* 


Let pan = p[Z, > a|njand ps, = pi[Z, < —b|n]; then 


(2.309) Pen = a Pai (N) 


and 


r—1 


(2.310) Pon = Ly Poi(n). 


t=0 


We have also: 


0 m—1 
(2.311) a Pon = dX tai = p(Zn > @) 


4 It can be seen from (2.303) that for a fixed 7, f(t, r) = O implies that ¢(t) = 1/7. Hence 
ifr <1, ¢(r) > 1. Thus, the Fundamental Identity is valid in the neighborhood of + = 0. 


OUET ASE FERAL TEE Ge tere eres Stee 


wi 


qiveveastwers 











288 M. A. GIRSHICK 


and 
co r—1 
(2.312) D Pon = dX fi = p(Zn < —b) 


where m, is the smallest integer greater than or equal to a/m, and mp is the 
smallest integer greater than or equal to b/r. 


2.4. Application of the method to the binomial distribution. We shall 
consider the binomial in terms of acceptance inspection although the results 
are general. 

Let a sequential acceptance inspection plan be defined by p;, pe, @ and g 
where 7; is the fraction defective which can be tolerated in the lot, pe is the frac- 
tion defective which cannot be tolerated, a is the maximum probability that the 
lot will be rejected when the fraction defective is p; or less and 8 is the maximum 
probability that the lot will be accepted when the faction defective is p, or 
greater. Then the sequential criterion is given by two parallel lines ({1] and [3)). 

















(2.401) d; = —h, + sn 
(2.402) ds = he oS sn 
where 
log 1 7 a 
2.403 nn 
( ) 1 er 
pill — pr) 
log 2 
(2.404) ho = ‘otnmiagaie 
lo p2( a Pp) 
pill — pe) 
og =P 
2.405 s= — ele 
( ) 0 — mi) 
pi (1 — pe) 


and n is the number of observations taken sequentially. We assume that 
a+B<landp; < pe. Then hand hare positive and s lies between 0 and 1. 

The sequential procedure is as follows: Items are examined one at a time in 
sequence. If at any stage, the cumulative number of defectives found in the 
sample thus far taken is less than or equal to d; given by (2.401), the lot is ac- 
cepted; if the cumulative number of defectives is greater than d, given by 
(2.402), the lot is rejected; if neither holds then another observation is taken 
and the process continued. 

It is easy to show that the sequential test described above is equivalent 
to the following: A variate z takes on the values —s and 1 — s with respective 








all 
Its 


1 p 
ac- 
the 
um 

or 


3]). 


chat 
dl. 
e in 
the 
; ac- 
by 


dv 


ken 


lent 
tive 


SEQUENTIAL ANALYSIS 289 


probabilities qg and p. A sequential test is defined by the two boundaries —h, 


and h, and by the decision function Z, = Dd 2a where Ze is the ath observa- 


a=1 
tion on z. The sequential test terminates if and only if Z, < —h,or Z, > h. 

As was mentioned above, s lies between 0 and 1.° We shall derive the exact 
power and the distribution of n for this sequential test by assuming that s = 
u/v where u and v are integers and u < v. This restriction is not serious since 
every value of s can be approximated to any degree of accuracy by a rational 
fraction; and, moreover, when the sequential test is applied in practice, s is 
always taken as rational. 

Suppose s = u/v. Then the sequential test is equivalent to a test in which 
the variate z takes on the values —u and v — u with probabilities q and p, 
respectively, and the boundaries are given by —hw and hv. Let b be the small- 
est integer greater than or equal to hw and a be the smallest integer greater 
than or equal to hw. Then, since u and v are integers, there is no loss in gen- 
erality in assuming that the boundaries are —b and a. We shall also assume 
that w and v are prime to each other (i.e. the fraction u/v is reduced to lowest 
terms) so that the interval (—b, a) is the shortest possible for this test. 

The above discussion shows that a sequential test based on the binomial 
can be considered as a special case of the class of tests treated in this section. 
Since z takes only on two values, the linear equations (2.209) assume the simple 
form: 


(2.401) — VC j4u—v-1 + Cj = QC j4u—1 _ 6b; ’ (J = 1, 2, o* Sal b —_ 1) 


where C; = 0 when k is negative or greater than a + b — 2. In terms of the 
C’s, the &; and E,; are given by 


(2.402) fj; = qC'u—j-1 i (7 =0,1,---,u-1), 
(2.403) ba; — QC a+b-+u—v+ j-1 ’ J ie 0, i ** “yo a 1). 


The conditional generating functions of n are obtained by solving (2.401) 
with rp substituted for p and 7q substituted for gq. 

Since the first v — u and the last w equations in (2.401) contain only two 
terms and all the other equations contain only three terms, the C’s can be ob- 
tained without too much difficulty by direct substitution provided a + 6b is 
not very large. When a-+ bis sizeable, a general solution is called for. So far, 
the author has been able to obtain this only for the case u = 1. This special 
case also has been considered by Walter Bartky [4]. 

Setting wu = 1 in (2.401) we get 


(2.407) —pC j-+ + Cyri-q; = bj, ¢) = |, 2, “5 a+ b— 1), 


where C, = 0 when k is negative or greater than a + b — 2. 





* In fact, it follows from Theorem 1, section 3.2 below that p; < s < pa. 


we 065 we FEE EEE i ee 


Es6 86 FTES 











290 M. A. GIRSHICK 


Consider a general set of equations of the form (2.407) with the subscripts 
ranging from 1 to an arbitrary integer k. Let the determinant of these equa- 
tions be designated by A,. Then by direct expansion it can be shown that 
A; satisfies the difference equation. 


(2.408) Ay = Ana — pg” Ar» 
with the initial conditions 
(2.409) A;=1, i=1,2,---,y-1; A,=1—pq"" 
The difference equation (2.408) can be solved by well known methods. We set 


(2.410) o(z) = > Aj;a*? 
j=1 
and then multiply each side of (2.410) by 1 — x + pq’ “x”. This yields 
(2.411) (1 — a + pq’ '2°)¢(z) = » [Aj — Apa + pq? Aj)a*™. 
Fon 


But by (2.408) and (2.409) we find that the right-hand side of (2.411) equals 
1 — pq’ x”. Therefore, 


v—l1 v-—l 
2.412 x) = ake ae 
(2.412) 0) = ae 
If we expand (2.412) in a power series in 2, the coefficient of z* will be Akai - 
This expansion can be performed readily and we get: 


(2.413) Aan = as > _— 1)’ rts —1 -> (—1) er ae raye 


where m, stands for the largest integer less than or equal to,k/v mez stands for the 
largest integer less than or equal to k— v + 1/v and C; =r!/t\(r — #)!. 
Let us define Ap = land A, = Owhenk <0. Then, in terms of the extended 
definition of A, , C; is given by 
— Aj Ag — Aj» Aayo-1 
gta 


(2.414) C; = 
a+b—1 


for j = 0,1, ---,a+6-—2. To prove this, we substitute in the left-hand 
member of (2.407) the expression for C;, given in (2.414) and get 


(2.415) Derea(Ain — Ave + pg” bf ~v-t) = Aai(AjAj-1 + pq” Aj») | 
q Aa+o-1 
But in view of (2.408), (2.409) and the extended definition of A; , the expression 
in (2.415) vanishes for all 7 ~ b. When, = }, {the expression equals 1. Hence, 
it follows that (2.414) is the desired solution. 
Let Lp = p|Z, < —b]. Then L,, when plotted against p, gives the operating 
characteristic curve for this sequential test. But Z, = qCy. Hence, we have 








Aw 





SEQUENTIAL ANALYSIS 291 


Ag 
(2.416) L, = ¢ —— 





Aa4s-1 ; 


As a final remark, we wish to point out that the solution to the sequential 
problem presented in this section, where taken in conjunction with Wald’s 
solution, is of mathematical interest, since it relates each element of the inverse 
of a square matrix (designated by A in this section) with the roots of a poly- 
nomial f(é) given by (2.201). 


III. ConsuGATE DIstTRIBUTIONS 


3.1. General discussion. Consider a random variable X with a distribution 
density f(x, 6).° Let 6, and 6 be two specified values of 6 and let 


er F(z, 62) 
(3.101) z= 108 Fe 6) 


For any hypothesis 6 = 6’, let o(¢| 6’) be the moment generating function 
of z. That is, 


(3.102) o(t| 0’) = [ e f(x, 6") dz. 
Let h be the real non-zero value of t for which ¢(t | 6’) = 1’ and let 
(3.103) F(x) = e f(a, 6’). 


Then F(x) is a distribution density. Following Wald [5], we shall call F(x) 
and f(z, 6’) conjugate distributions. 

The distribution density F(x) depends on 6; , 6, and 6’. In some instances 
F(x) will be a member of the class of distributions f(z, 0). This is the case, 
for example, when z is a discrete variate. It is the case also if 6’ = @,. For 
then h = l and F(x) = f(x, &). If F(x) belongs to the class of distributions 
f(x, 6), we shall designate F(x) by f(x, 6’’) and call 6’ and @” a conjugate pair. 


3.2. Conjugate pairs and the power curve for sequential probability ratio 
tests in which the underlying distributions admit a sufficient statistic. Let 
f(x, 6) admit a sufficient statistic and let a sequential test be defined in terms 
of the probability ratio z given by (3.101) for some specified hypothesis @, and 
alternative hypothesis 6 with 0, < 6. Let the boundaries be given by —b 
and a where a and b are positive. Since f(z, @) admits a sufficient statistic, 
it can be written in the form 


(3.201) f(x, 0) = putz Ot r(a) +00) 
The probability ratio z is then given by the simple expression 
(3.202) z = u(x)[v(@2) — v(A:)] + we) — wih). 


6 If X is discrete, then f(z, @) stands for the probability that X = z when @ is true. 
7 See section 2.31 and Lemma II, section 2.32 in [6]. 


aresere C2 wer 88 e ete es Ce eee 


we 


aseurwanseuet @ 


— 











. A. GIRSHICK 


Let 
a 
(3.203) oe a 
es a 
(3.204) 0 9 oe a 
(3.205) oo 


v(62) = v(61) s 


In terms of b*, a* and s, the sequential criterion is defined by two parallel 
lines® 


(3.206) A, = —b* + sn 
(3.207) R, = a* + sn 


and the decision functions >> u(xz_). The hypothesis @ = @ is accepted 


a=) 


whenever > u(za) < Aj, and rejected whenever > ura) > Ra. If 
a=l a=) 


n 
A, < Do u(x.) < R,, another observation is taken. This process is con- 
a=l 


tinued until one or the other decision is reached. 

In what follows, we shall restrict ourselves to the general class of functions 
f(x, 6) for which the differentiations under the integral sign indicated below 
are permissible and v(@) is a monotonic function of 6. 

Consider the function 


(3.208) y(6) = sv(@) + w(8). 


We shall show that y(@) = constant has exactly two roots in @. To this end, 
we prove the following theorems. 

THEOREM 1. Let Eu(x) | @ be the expected value of u(x) under theassumption 
that 6 is true. Then there exists a value of 6 = 6 such that (a) Eu(x) | % = 8; 
(b) 6, < % < ®&and Eu(xz) | i < s < Eu(z) | & af v(O) zs an increasing func- 
tion of 0, and the inequalities are reversed if v(6) is a decreasing function of 6. 

Proor: Assume that v(@) is an increasing function of 6. Let z* = u(x) — s 
and let ¢(t) | @ be the moment generating function of z* under the hypothesis 
that @ is true. Then, it is easy to see that ¢(h|6:) = 1 and ¢(—h|@) = 1 
where h = v(@) — v(6@). Since h is positive, it follows by Lemma 1, section 
2.6 of [6], that Ez*| 6, < O and Ez*| 6, > 0. Therefore, Eu(zx) | 6: < s and 
Eu(x) | 6 > s. Moreover, as we shall see in the proof of Theorem 2 below, 
Eu(zx) | 6 is assumed to be a continuous function of @ and proved to be mono- 


8 It is here assumed that v(@2) — v(@:) > 0. If this is not the case, then a* and b* have 
to be interchanged. 





dd 


If 


n- 


ns 
WwW 


SEQUENTIAL ANALYSIS 293 


tonically increz3ing. Hence it must follow that there exists a @ = 6 such that 
Eu(x) | 6 = sand 6; < 6 < @. This proves the theorem in case v(6) is mon- 
otonically increasing. However, the argument is identically the same in case 
»(@) is monotonically decreasing. 

THEOREM 2. Let ¥(@) be defined as in (3.208). Then ¥(@) is a monotonically 
increasing function of 0 in the interval 6 < 6) ; assumes a maximum at 6 = %; 
and is a monotonically decreasing function of 6 in the interval 0 > 6. 

Proor. If we differentiate twice the identity 


(3 209) [ eX 9) +r (a) +0 6) dx = 1 

with respect to @ we get : 

(3.210) v'(6)Eu(r) | 6 + w’(6) = 0 

and 

(3.211) v’'(0)Eu(x) | @ + w’(0) = [v’(@)P oc 


where o% 2) is the variance of u(x). Also, if we differentiate under the integral 
sign the function Eu(x) | 6 with respect to 6, we get 


dEu(z) | 6 
dé 


Now by hypothesis, v(@) is monotonic in 6. Hence from (3.212) we see that 
Eu(x) | @ is also monotonic. Moreover, if v(@) is an increasing function of @, 
so is Eu(x) | 6, and conversely. Let us assume that v(@) increases with 0. 
Then for all @ < @, Eu(x) | @ < sand for all 6 > 6, Eu(x)|@> s. Conse- 
quently, we have 


(3.212) = v'(0)or). 


(3.213) y'(0) > v’'(@)Eu(x) | 6 + w'(A) 
for all 6 < @ and 
(3.214) v'(0) < v’(@)Eu(x) | 6+ w’(8) 


for all @ > @. But by (3.210) the right-hand side of these inequalities is equal 
to zero for all 6. Hence y’(6) > 0 for @ < 6 and y’(6) < Ofor@ > 4. The 
same argument holds when 7(6) is a decreasing function of 6. Now let 6 = @. 
Then by (3.210), we see that ¥/(@%) = 0. Hence, ¥(6) isa maximum at @ = @. 
This proves the theorem. 

Let c be any constant < ¥(@) within the domain of ¥(@). Then by Theorem 
2, the equation ¥(@) = c has two roots in 6. Let these roots be designated by 
# and 6’. We now prove the following theorem. 

THrorEeM 3. Let 2* and $(t| 6) be defined as above. Then (a) $(t| 6’) = 1 
fort = v(6”) — v(@ ); (b) o(t| 6”) = 1 for t = v(6’) — v(6”); and (c) 6’ and 6” 
from a conjugate pair with respect to 2*. 


aseewene ee we FF 


we 


Os5 05 F tee euree « 





294 M. A. GIRSHICK 
Proor: By definition 
(3.215) o(t | 6’) = [ et (2) 1000") +8)]-+7(e) +000" )—te dr. 


Now let t = v(6”) — v(@’) = h. Then, in view of the fact that ¥(6’) = (6), 
we get 


(3.216) o(h | 6’) os [ griere@r tra) to@"") 2. wm i. 


argument also shows that f(z, 6”) = e’*"f(x, 6’). This proves the theorem. 
Turning now to the sequential test defined by (3.206) and (3.207), we see that 


n 


In a similar manner, it can be shown that @(—h | 6”) = 1. Moreover, the same 


it is equivalent to a test with the decision function Z* = >> 2% and the two 
a=l 
boundaries —b* and a*. Let Le be the probability that the sequential test will 
terminate and Z* < —b* (ie. the hypothesis 6; is accepted) when @ is true. 
Then (neglecting the fact that at a decision point sd might exceed a* or fall 
short of —br), Le and Le are given by (see for example (2.406) in [6]). 
eiartbryh bth 


(3.217) Le = 


er 
e" — | 


and 


—h(a*+b* - hb* 
€ —€ 


(3.218) Lg = — aa - om Le 
e = | 

where h = v0(6”) — v(6’). Thus, we see that the two roots of the equation 
¥(0) = c determine two points on the power curve for the sequential test. By 
assigning various values to c we obtain as many pairs of points as desired. 

The above results show that for the class of distributions under consideration, 
the real non-zero roots of g(t| @) = 1 are obtainable from the roots of ¥(@) = 
constant. Since ¥(@) is completely defined by the form of the distribution 
f(z, 6), the power curve of the sequential test can be obtained without a knowl- 
edge of the moment generating function of z*. This might be advantageous 


. 


In some c1ses. 


3.3. The distribution of n under conjugate hypotheses. Let P;(n,g) stand 
for the probability that a sequential test will terminate with Z, < —b in exactly 
n steps when the distribution density of xisg. Let P.(n | g) be similarly defined. 

THEOREM 1. If we neglect the excess of Z, over a and —b at a decision point, 


(3.301) Py(n| F) = e™Pi(n jf) 
(3.302) P,(n | F) = e’P.(n|f) 


where f and F are conjugate distributions as defined in (3.103) and his the non-zero 





SEQUENTIAL ANALYSIS 295 


real value of t for which $(t | f), the characteristic function of z = log f(x, 62)/ f(x, 61) 
underthe hypothesis f, equals 1. 


Proor: Since, by definition, F = e*"f, it follows that y(t — h| F) = ¢(t|f) 
where ¥(¢| F’) is the characteristic function of z under the hypothesis F. Let 


(3.303) o(t|f) = 67 


where 7 is a pure imaginary. Furthermore, let (7) and #(7) be the roots of 
(3.303) such that lim 4(7) = 0 and lim#(r) = h (see [2], page 289). Then 
r—0 r—0 


(7) — h, and t(r) — h will be the corresponding roots of 

(3.304) Y(t|F) =e. 

Now by the Fundamental Identity we have 

(3.305) Lye Bsse™ + (1 — Lye Ease = 

(3.306) Lye Bye” + (1 — Lye Baze™ 

and 

(3.307) Lpe OB pe + (1 — Lye Be = 1 
(3.308) Ly OO Be” + 1 — 1B = 1 


where Ly; = P[Z, < —b|f], Ex, stands for the expected value of e’” under the 
hypothesis f and the restriction Z, < —b; E,; stands for the expected value of 
e™" under the hypothesis f and the restriction Z, > a; and the symbols Ly, , 


Eyr and Eg, are similarly defined. 


By comparing equations (3.305) and (3.306) with (3.307) and (3.308) we 
see that 


(3.309) LrEbr e" = eh, Exe" 
and 
(3.310) (1 — Ly)Eare™ = e(1 — Ly) Ease". 


Since the above relationships hold for the characteristic functions of n, they must 
also hold for the distribution of n. This proves the theorem. 
If we set r = 0 in (3.309) and (3.310) we also get 


(3.311) Ly =e Ly 
and 
(3.312) 1—L, = e“(1 — Ly). 
In view of (3.311) and (3.312) we see from (3.309) and (3.310) that 
(3.313) Evre™” = Ese” 





296 M. A. GIRSHICK 


and 
(3.314) Eare"” = Eaze’”. 


From (3.313) and (3.314) we obtain the following rather surprising theorem, 

THeorREM 4. Except for the approximation indicated in Theorem 1, the con- 
ditional distribution of n under the restriction that Z, < —b as well as the restric- 
tion that Z, > a is identical for the two hypotheses F and f. 

The above theorems are of particular interest when F is a member of the 
class of distributions f. In any given sequential test the results of Theorem 1 
can be used to facilitate the computation of the probabilities of making a de- 
cision. Furthermore, the results of Theorem 4 show that the conditional dis- 
tribution of n throws no light on the parameter @ involved in the distribution 
of z. This follows since the conditional distribution of n is identical for the con- 
jugate pair 6’ and @’’, and, in any practical problem, 6’ and @” will represent 
opposing hypotheses. 

We shall now establish exact relationships of the type considered above when 
the variate z takes on a finite number of integral values. 

Let z take on the values —r, —r + 1, ---, —1,0, 1, 2, ---, m with P(z = 7) = 
pi. Furthermore, let P; = e**p; where h is the real non-zero root. of 


(3.315) Zz pie" == |. 


Then the probabilities P; and p; are conjugate. We set e' = u and define 
¢(u | 6) to be the generating function of z under the hypothesis p(z = 7) = 6,. 
Then 


(3.316) d(u|p) = Be piu’ 


= 


and 


(3.317) o(u|P) = > Pu = DO pile)’. 


i=—r 


‘=— 


Consider a sequential test defined by two boundaries —b and a and a decision 


n 


function Z, = ah... Let + and e* stand for the probabilities that Z, = 


a=l 

—(b + 7) and Z, = a + i respectively under the hypothesis that 6; = P(z = 2). 
Furthermore, let Psi(n | 6) and P.i(n | 6) stand for the probabilities that Z, = 
—(b + i) and Z, = (a+ 2) respectively in exactly n steps, under the hypothesis 
6; = P(z = i). Also, let the symbols £3; and E2; stand for conditional expecta- 
tions under the hypothesis 6; = P(z = 2) and under the restriction that Z, = 
—(b + 12) and Z, = a + t respectively. 

Since z takes on a finite number of integral values, the Fundamental Identity 
for the two conjugate hypotheses, p and P can he written as: 





SEQUENTIAL ANALYSIS 


r—1 m—1 
(3.318) De thu? Bblo(u | pb + OO eeu’ E2lo(u | py” = 
t=0 


t=0 


and 


r—1 m—1 
(3.319) » teu °* Ee lo(u |P)r" + » teu Ealo(u | P)” = 1. 


For any real number 7 let w(7), w(7), ---, Urem(7) be the r + m roots of the 
equation : 


| — t ] 
(3.320) o(u|p) = Ae ne. 
Then, in view of (3.317) the corresponding roots of 


(3.321) o(u|P) = D> Pyut = = 


——r 


are given by u(r)e * wise, -->, Urem(t)e. Substituting these roots in 
(3.318) and (3.319) successively, we get 

fr] m—1 
(3.322) De ei us(r)*? BBs” + 2 E;uj(r)°* Er” =.1 


t=O 


and 


=. m—l 
(3.323) LD filus(r)e “VO Bir” + DO eilu(r)e Bir” = 1 
t—0 


t—0 


forj = 1, 2, ---,7 + m. Since the roots u;(7) are assumed to be known, the 
unknowns in (3.322) and (3.323) can be solved in terms of these roots provided 
the determinant of the equations is different from zero. But in section 2, we 
have indirectly shown that for a sufficiently small 7, the determinant is dif- 
ferent from zero. ‘Thus, assuming that the solution has been obtained we see 
from (3.322) and (3.323) that 


(3.324) tee Ege t” = 6 OTP ER BR” 
and 

(3.325) es Bacr” = Ot? B27”. 
Setting 7 = 1, we get 

(3.326) fos = grrr”, 

and 

(3.327) So = oes, . 


Moreover, if we expand the expressions in (3.324) and (3.325) in a power series 
in t (which by section 2 is permissible), and compare coefficients of 7” we get 





298 M. A. GIRSHICK 


(3.328) Py.(n| P) = &"°* Py (n | p) 
and 


(3.329) Pai(n| P) = e*°*P,.(n | p). 


REFERENCES 


[1] ABRAHAM WALD, ‘Sequential tests of statistical hypotheses,’’ Annals of Math. Stat., 
Vol. 16 (1945), pp. 117-186. 

[2] ABRAHAM WALD, ‘“‘On cumulative sums of random variables,’’ Annals of Math. Stat., 
Vol. 15 (1944), pp. 283-296. 

[3] SratisticaL RESEARCH GROUP, CoLUMBIA UNIVERsITY, Sequential Analysis of Statis- 
tical Data, Applications, Columbia Univ. Press, 1945. 

[4] WaLTER Barrky, ‘‘Multiple sampling with constant probability,’’ Annals of Math. Stat., 
Vol. 14 (1943), pp. 363-377. 

[5] ABRAHAM WALD, ‘‘Some generalizations of the theory of cumulative sums of random 
variables,’’ Annals of Math. Stat., Vol. 16 (1945), pp. 287-293. 

[6] M.A. Grrsuick, ‘“‘Contributions to the theory of sequential analysis, I,’’ Annals of Math. 
Stat., Vol. 17 (1946), pp. 123-143. 





SUFFICIENT STATISTICAL ESTIMATION FUNCTIONS FOR THE 
PARAMETERS OF THE DISTRIBUTION OF MAXIMUM VALUES 


By Braprorp F. KimBau 
New York State Department of Public Service 


1. Summary. The problem of ‘:nating from a sample a confidence region 
for the parameters of the distribu ion of maximum values is treated by setting 
up what are called “statistical estimation functions” suggested by the func- 
tional form of the probability distribution of the sample, and finding the moment 
generating function of the probability distribution of these estimation “functions. 
Such an estimate by the method of maximum likelihood is also treated. 

A definition of “sufficiency” is proposed for “statistical estimation functions” 
analogous to that which applies to “statistics.”” Also the concept of “stable 
statistical estimation functions” is introduced. 

By means of a numerical illustration, four methods are discussed for setting 
up an approximate confidence interval for the estimated value of x of the uni- 
verse of maximum values which corresponds to a given cumulative frequency 
99, for confidence level .95. Two procedures for solving this problem are 
recommended as practicable. 


2. Introduction. If the universe comprises a set of maximum values of a 
large number of quantities, it has been shown that in many cases the probability 
density function of such a set of values of x is given approximately by 


(2.1) f(x) = ae 'e*', t= a(x — u), —-x <r< +n, 


where a and u denote parameters, usually unknown [1]. 

This paper is concerned with the problem of estimation of the parameters 
a and u on the basis of sample data. 

The notion of “sufficiency” is fundamental in the problem of estimation, 
since it means that the necessary elements of the sample have been used which 
will result in complete determination of that part of the sample probability 
distribution function involving the unknown parameters to be estimated. 
Unfortunately it does not seem to be possible to set up “sufficient statistics” 
within the usual definition of “statistic” for the above distribution. In this 
investigation the writer was struck by the fact that certain functions of the data 
involving one of the parameters could be used to play a very similar role to a 
set of sufficient statistics for determining a and u, in spite of the fact that one 
function involved the value of a, and hence was not directly determined by the 
data,—and hence not a “statistic.” 

Various statistics have been used in the past to estimate the parameters a 
and u, such as the sample mean, variance, mean deviation and an adjusted 
modal value (see [2] and [3]). For the reason noted above, sufficient statistics 


299 











300 BRADFORD F. KIMBALL 


have not been developed. In order to bridge this impasse and meet the es. 
sentials of the condition of sufficiency, the writer believes that a broader defini- 
tion of sufficiency is needed. Such a definition is developed in the following 
section. 


3. A broader definition of sufficiency. If the reader reexamines the process 
of estimating the two parameters of the normal distribution, and the deter- 
mination of the two parameter confidence region for them from the statistics 
consisting of the sample mean, and the mean square deviation of the sample 
values from their mean, he will find that the separate determination of # and 
s’ is not inherently necessary. The mean a and the variance o° of the universe, 
are usually estimated from the pair of equations 


E(z) = a, E(s) = (n —- 1)o*/n 


and the boundary of the confidence region is determined from knowledge of 
the bivariate distribution of # and s, which involves the four variables 2, s, 
a,and o. The equation of the bounding curve is most easily set up in terms 
of transformed variables such as 


(3.1) U=VJn(é-a)/o, V=Vns/o. 
Then the probability density of U and V is given by 
f(U,V) = (const.)V" e779? 


and with confidence coefficient 8; a bounding curve may be defined implicitly 
by the two equations 


[ [1w, v avav = 6, 


f(U; ,V:)' = constant 


where the above integral is taken over the region of the V = 0 half of U,V 
plane bounded by the curve f(Ui,Vi) = constant. 

A range of estimate of the parameters a and a is offered by this confidence 
region by virtue of the fact that each point of the region corresponds to a unique 
pair of values of a and o for. a given set of sample values 0,(2;), and the fact 
that the equation of the bounding curve does not involve the parameters a and ¢. 
Thus one arrives at a determinate range of estimate of a and o, after the sample 
values have been observed. In this paper such functions will be referred to 
as statistical esiimation functions (see [4]). 

The classical idea of sufficiency implies (a) that the estimate be adequate 
for unique determination of the parameters, and (b) that all the sample in- 
formation pertinent to such estimation be used. In the case of “statistics” 
the second requirement has been simply and elegantly formulated by the 
requirement that the probability density function of the sample distribution 












e @es- 
ofini- 
wing 


acess 
eter- 
istics 
mple 
and 
eTse, 


ze of 


’ 


erms 


icitly 


‘U,V 


lence 
nique 

fact 
nd o. 
mple 
2d to 


uate 
le in- 
stics” 
y the 
ution 


STATISTICAL ESTIMATION FUNCTIONS 301 


factor in such a way that one factor be completely determined by the statistical 
estimates and the parameters of the distribution, and that the remaining factor 
be independent of the parameters to be estimated (see [7], or [5] p. 135). 

It seems to be possible to carry over this formulation to statistical estimation 
functions (denoted by 7';). Since one or more of the parameters to be estimated, 
denoted by (a: , a2, ++, a,), are involved in these functions, a requirement that 
they be adequate for unique determination of these parameters is obviously 
that there be a one-to-one correspondence between the parameter set (a: , a2, 
-++, a,) and the set of estimation functions (71, T2, ---, T,) in the region of 
estimate. This requirement will be referred to as Requirement (1). 

It has been pointed out by a referee that some further requirement as to the 
independence of the probability density function of (71, T:, ---, T,) relative 
to the parameters to be estimated is needed. 

If one requires that the p. d. f. of (71, Tz, ---, T,) be entirely independent 
of the parameters (a: , a2, ---, a,) the estimation functions will furnish ‘‘con- 
fidence regicns” for estimates of the parameters;—see example noted above 
for the normal distribution. 

However, in scme cases the mean values E(7;) may be independent of the 
parameters, while the p. d. f. may not be; for example, —estimation functions 
for the two parameters of the Pearson Type III distribution formed from the 
maximum likelihood functions of that distribution. In such cases, a point 
estimation of the parameters is still possible, and would seem to satisfy the 
classical requirements of sufficiency. 

The author accordingly makes the following proposals: 

(a) Statistical estimation functions that satisfy the first two requirements— 
that of one-to-one correspondence with the parameters to be estimated, and the 
factoribility condition—be termed sufficient for estimation of the parameters. 
The reasonableness of such a definition is strengthened by the observation 
that given a set of “sufficient statistics” in the classical sense, statistical estima- 
tion functions that satisfy the factoribility condition can always be formed from 
them, and hence they are subject further only to Requirement (1) to make 
them sufficient statistical estimation functions under the proposed definition. 

(b) Statistical estimation functions that satisfy Requirement (1) and also 
have a p. d. f. which is independent of the parameters to be estimated shall be 
called stable —a term suggested to the author by a referee. 

(c) Statistical estimation functions T; that satisfy Requirement (1) and are 
such that E(T;), (¢ = 1, 2, ---, r), be independent of the parameters to be esti- 
mated, be called stable in mean, and that similarly, if the modal or median 
values of 7; be independent of these parameters, they be called stable in mode, 
stable in median, etc. 

Thus a definition of sufficiency applicable to statistical estimation functions 
is formulated as follows: 

The term “statistical estimation function” will be used to denote a function 
of the sample values and one or more population parameters, used for purposes 
of statistical estimation. 











302 BRADFORD F. KIMBALL 






Given a universe with probability density function involving m parameters 


a, , @2, ***, Gm in an admissible region R, and a set of r statistical estimation 
functions 7'(O,n ; a1, d2, °**, Gm) to be used for estimating the r parameters 
a1, G2, -**, a, relative to the information in a given sample 0, . Consider 


the conditions: 

(1) The functional form 7; insures «a one-to-one correspondence between 
the points of the r-parameter space (a; , d2 , ---, @,-) contained in # and the points 
of the r-space defined by (7, , T:, ---, 7) for fixed 0,(x;) and fixed parameter 
values d;41, Qri2, +, tes 

(2a) It shall be possible to express the probability density function of the 
sample 0, as 


P(0,) a gi(T 1 ’ T2 ee T, » M1, 42, °°", Am) + g2(On > Gr+1, Ar+2, °"*, Gm), 


where the first factor is uniquely determinable for fixed (a; , a2, ---, @m) from 
the corresponding values of the functions 7; , and the second factor is inde- 
pendent of the parameters to be estimated. 

(2b) It shall be possible to express the probability density function of the 
sample 0, as 


P(O,) = G(Ti, Te, -+:, Tr; a1, de, +++, Am)go(On ; Gr41, Arye, **°, Am), 


where G(,, °*°, ;@1,@2, °**, Gm) isa functional, depending on a, ad, ---, Gm, 


which in general involves the values of the 7; for values of a, a2, --+, Gm 
different from those appearing in the rest of the identity. (For example, 


(T, a) = exp [ T(0, ; a’)da’.) 
0 


(3) The r-variate probability density function of 7; based on P(O, ; a1, a, 
-++, Gm) shall exist. 

Definition A. A set of statistical estimation functions 7; which satisfies 
conditions (1) and (2a) will be said to be a sufficient set of estimation functions 
for estimating the parameters a; , (¢ = 1, 2, ---, r), relative to the sample 0, . 

Definition B. A set of statistical estimation functions 7; which satisfies 
conditions (1) and (2b) will be said to be a functionally sufficient set of estima- 
tion functions for estimating the parameters a; (? = 1, 2, ---, r), relative to 
the sample O, . 

Definition C. If the conditions (1) and (3) are satisfied, and the p.d.f. of 
(T,, Tz, ---, T,) is independent of the parameters a;, (¢ = 1, 2, ---, 7), the 
functions 7’; will be said to be stable relative to estimation of these parameters. 

Definition D. If the conditions (1) and (3) are met, and E(7;), (¢ = 1, 2, 

-, r) are independent of the parameters to be estimated, the functions 7; 
will be said to be stable-in-mean; and similarly if modal or median values of 7; 
are independent of these parameters, the estimation functions will be said 
to be stable-in-mode, stable-in-median, etc. 








STATISTICAL ESTIMATION FUNCTIONS 


It is not difficult to prove that a set of maximum likelihood functions 
La = dflog POn 5a, 8)\/da, Lg = dflog PO, ; a, 8)]/aB 
under the condition that the second order determinant 
Laa Lag 
Iga Les 


exists and does not vanish over the admissible range of a and 8, constitutes a 
set of estimation functions for a and 6 that are functionally sufficient and stable- 
in-mean under the definition given above. The meeting of Condition (2b) 
is demonstrated by the relation 


a 8 
log P(0, ) a, B) = / Lala, Bo) da + i Ls(a, B) dp + log P(0, > &, Bo) 


since the first two terms on the right depend entirely upon the functions L, 
and Lg, and the third term on the right becomes independent of a and 8, if 
a and Bp are arbitrarily chosen, once for all, in the admissable region R. 

In general the maximum likelihood functions are not stable estimation func- 
tions, but in many cases by the introduction of suitable factors which appear 
in the variance-covariance matrix (see (5.3) and (5.4)) estimation functions 
may be formed which satisfy Definition C. 


4. Sufficient statistical estimation functions for the distribution of maximum 
values. The probability density function for the sample 0,(x;) drawn from 
a universe of maximum values is 


(4.1) P(0,) = af er g ster) 


where the summation sign used here and hereinafter refers to summation over 
all indices from 1 to n. Let # denote the sample mean, and define a new set 
of variables 2; by 


(4.2) a=e (i = |, 2, a n), 


with mean Z. Also set 


—au 


2 =— @ 


Recognizing that the variables 2z; /z) are independently distributed like x’ 
on two degrees of freedom, the probability density function of Z is given by 


(4.3) P(2) dz = [1/T(n)]e~**'*9(nz/zo)”'n d2/z0 


with mean equal to z and variance equal to z5/n. 
The mean value of ¢ of the original distribution (2.1) is known to be Euler’s 
constant, which will be denoted by C. Thus 


(4.4) Elo(# — u)] = C = 5772157. 











304 BRADFORD F. KIMBALL 


The above considerations point to a set of statistical estimation functions 
defined as follows 


i 


X 
y 


( Vn lalé — u) — Ch, 
4.5) o 
Vn [2/zo = 1}. 


The author was not able to determine the explicit bivariate probability density 
function of X and Y, but the moment generating function G may be found 
with some degree of facility if the variables z; are used in (4.1). Using sim- 
plified functions na(Z — u) and nz/Zz, 


(4.6) G(6, . 6.) ts aren ree =e (1 asd 62) nO, or a 0). 


Clearly Z and Z are not statistically independent. The first and second partial 
derivatives give 
Go, (0, 0) = nC, G,(0, 0) = n, Ge,o,(0, 0) = ne /6 + n7C°, 
— Ge.0,(0, 0) = n’ +n, (o,0,(0, 0) = nC — n. 
Hence the variances of the marginal distributions are 
(4.8) o|[na(é — u)] = nw /6, o'(n2/z) = Nn, 


and the covariance is equal to —n. 

Now the marginal distributions rapidly approach normality with increasing 
n. The question arises whether the bivariate distribution approaches normality. 
One way to prove this is as follows: Consider the moment-generating function 
G, of the statistical functions X and Y defined by (4.5). Following methods 
outlined above, with 9; = /n6, & = Wn, it is not difficult to show that 
the logarithm of the moment generating function G.(63 , 04)is given by 


log G2 
= (+/n 6; — n) log (1 — h/ Vn) —-Vn& +n log PL — 6/ Vn) — Vn. 
Asn — «, one notes the relations 
—nlog(l1 — &/VWn) — Vn& = G/2 + oVn), 
(4.9) nlogT(1 — /VWn) — VnCbs; = (63/2)(x'/6) + o2(W/n), 
Vn @slog (1 — &/V/n) = —O:% + 03(V/n), 


where 0,;(4/n) denote functions that approach zero as 1/n — ~, uniformly 
for 6s and & in the neighborhood of zero. The limit 


lim log Gz. = 3(6§ — 2030, + (x"/6)63] 


no 


is recognized as the logarithm of the moment generating function of a normal 
bivariate distribution. 





al 


STATISTICAL ESTIMATION FUNCTIONS 305 


Thus the bivariaie probability distribution function of the estimation functions 
X and Y approaches the normal bivariate distribution with zero means and variance- 
covariance matrix 

|lr'/6 —1| 
(4.10) 
as n increases without limit, and the means and second order moments thus indi- 
cated, hold precisely for all values of n. 

The functions X and Y satisfy Condition (1) for sufficiency relative to estima- 
tion of the parameters a and u provided a and u can be expressed as single valued 
functions of X and Y. A condition for this is that the Jacobian of the trans- 
formation shall not vanish. This Jacobian may be reduced to 


[((naz)/2)[@ — (Za “**)/(Ze %**)]. 


Let «; be ordered so that z, S 2;4;. Then for a > 0, the second term consti- 
tutes a weighted mean with positive weights which monotonically decrease as 1 
increases, When the inequality z; < x;4; holds. Hence unless all z; are equal, 
this weighted mean is less algebraically than %. Condition (2a) for sufficiency 
is clearly met by these functions. Thus one concludes that for a > 0, and the 
case that noi all x; are equal, the estimation functions X and Y constitute a sufficient 
set of estamation functions for the parameters a and u of distribution (2.1). Since 
the moment generating function (see (4.6)) is independent of a and u, these fune- 
tions are also stable estimation functions. 


5. Maximum likelihood estimation functions. General theory points to 


the use of the method of maximum likelihood as giving the most efficient solution 
(see [5]). With 


-e-a(z—-u) —a(z—y) 
(5.1) f(x) = ae“ ee 
the maximum likelihood estimation functions are 


Ly = —na(zZ/z — 1) 
(5.2) 

La = n{l/a — (¥ — u) + A(2/2)/dal 
with variance-covariance matrix 


2 


no n(l — C) 
(5.3) sii 2 || ° 
in(l — C) (n/o’)[x'/6 + (1 — C) | 
Thus with 
6.4) X = Vn (2/2 — 1), ¥Y = Vn [alu — ze ™(u + 22/2)) — (aé — 1))/B 


B= Vv 24/6 + (1 — C)?, 





306 BRADFORD F. KIMBALL 


where 
Za = O[2e “*/n]\/da, 


the bivariate distribution of X and Y rapidly approaches normality with in- 
creasing , with zero means, unit variances, and correlation coefficient given 
by (negative, since sign of L, has been reversed) 


(5.5) r= —(1 — C)/(V 2/6 + (1 — C)?). 


With non-vanishing Jacobian, X and Y constitute a sufficient set of estimation 
functions for the parameters a and u (see (3.2) above). Furthermore the unit 
variances and correlation value given above are exact for all values of n. By setting 
up the moment generating function it is not difficult to show that these functions 
are also stable estimation functions for all values of n. 

The theory of maximum likelihood further shows that if @ and @ are defined 
as the u and a solutions of the equations 


(5.6) Lu = 0, Le = 0 


the distribution of +/n (4@ — u) and Vn (& — a) will approach normality asymp- 
totically with zero means and variance-covariance matrix which is the reciprocal 
of the above matrix (multiplied by n); namely, 

a | (1/o")[L + (1 — C)’/(m*/6)] = — (1 — C)/("/6) || 
. | —(1 — C)/(x"/6) a /(x*/6) | 


6. Numerical applications. As an illustration of the application of the 
methods outlined above for determining the parameters of the distribution of 
maximum values from an observed sample, data is taken from the 57 year 
record of annual maximum flood flows previously used as an illustration by the 
author ([6] p. 324). There is some evidence to indicate that such a series 
follows approximately the distribution of maximum values. At any rate the 
series serves pretty well as a numerical illustration. 

Confidence regions for wu and @ can be determined by four methods based 
upon the preceding theory. In order to make the numerical illustration more 
cogent, we shall answer the following question by each of the methods. What 
is the confidence interval (with confidence level .95) for annual flood x correspond- 
ing to a cumulated frequency of .99 (often referred to as a 100 yr. flood) based 
upon our observed 57 yr. sample, under the assumption that the distribution 
of maximum values (2.1) applies to this data? 

Method 1. (Based on estimation functions of section 4.) In this case the 
statistical estimation functions X; and Y; defined from (4.5) by X; = X W/6/z, 
Y, = Y, are used. The “best values” of uw and @ are taken as the solutions 
of X, = 0, Yi: = 0, found by trial and error. As a starting point values of u 
and a may be estimated from X; = 0 and the standard deviation of 2; (see 





STATISTICAL ESTIMATION FUNCTIONS 307 


[2] or [6]), the mean deviation of 2; , or an adjusted modal value (see [3]). A 
few trials gives 


ti = 179.7, a = .01998. 
Approximating the distribution function of X, and Y, by the limiting normal 


bivariate distribution (4.10), with confidence level of .95 the equation of the 
bounding constant probability ellipse is found to be 


(6.1) Xi + (1.559) XY; + Y? = 2.3491 


where the constants are independent of the sample values. This ellipse, by 
virtue of the one-to-one correspondence between (X,, Y1) and (u, a) bounds 
uand @ based upon the observed sample (see [4]). 

For cumulated frequency .99, the distribution of maximum values (2.1) 
yields 


t = a(x — u) = 4.60015 


Thus the analytic problem is that of determining the maximum and minimum 
value of 


(6.2) x = g(u, a) = 4.60015/a + wu 


which occurs on the ellipse (6.1).’ 

The writer solved this graphically. It was found necessary to compute 
three values of Z7,—at a = .01, .015 and .025, in addition to the value of Z at 
a = .01998 previously found. From these computations the curves a = .01, 


a = 015, a = .01998 and a = .025 were drawn on the chart of the ellipse (6.1). 
The u = const. curves were quite easily determined by points on the a = const. 
curves found from their X, coordinates which are linear functions of wu (see (4.5)). 
The extreme values of g(u, a) will be found to occur near the extreme values of a 
on the ellipse. A construction of several wu = const. curves near these extremes 
enables one to determine several successive values of g(u, a) at points where 
these curves cross the ellipse. The answers were 


Max. g(u, a) = 507.4 at wu = 192, a .01459, 
(6.3) Min. g(u, a) = 360.0 at u 172, « .02447, 
and g(a@,a) = 409.9. 


Method 2. (Based on maximum likelihood statistical estimation functions 
(5.4)). For purposes of comparison the writer carried through the solution 
using the maximum likelihood estimation functions X2 and Y~ defined by (5.4). 


‘ Sinee with non-vanishing Jacobian of (X, , ¥1) relative to (u, a), no singular point of 
the (u, a) coordinate system can lie within the ellipse, it is clear from the form of the func- 
tion g(u, x) that its maximum and minimum values will lie on the boundary of the ellipse. 
A similar remark applies to Methods 2+. 











308 BRADFORD F. KIMBALL 


In this case the equation of the bounding ellipse was 
(6.4) X¢ + (.62614)X2Y2+ VY: = 5.4042. 


The determination of the network of a = const., w = const. curves was much 
more complicated in this case. The results were 
Solution of X = 0, Y = 0, gave @ = 180.6, @ = .01924; g(a, a) = 419.7 


Max. g(u, a) = 509.5 at u = 187, a = .01426 
Min. g(u, a) = 364.4 at u = 172, a = .02391. 


(6.5) 


The slightly smaller range of estimate of g(u, a) resulting from the use of 
the second method was forecast from the general theory which predicts a narrow- 
ing of range of variation of u and a for same confidence level. Both bivariate 
distributions involve exact moments of the first and second degree for finite n, 
and both approach normality rapidly with increasing n. Hence comparable 
results were to be expected. Of course the form of the function g(w, @) in relation 
to the different types of estimation functions used in the two cases might modify 
the comparability of the results. 

Method 3. (Based on limiting distribution of maximum likelihood statistics 
@ and 4, with variances unknown.) The use of the limiting distribution of 
the estimation functions ~/n(a@ — u), Wn(& — a) led to results which were 
not entirely expected by the author. Taking 


X; = Aa(t — u)/B, Y; = A(a/a — 1) 
(6.6) A=rvVn/V6, B=Vr/6+ (1 — Cy, 
with 
r= —(1 — C)/B, 


the equation of the bounding ellipse is the same as (6.4), (no reversal of sign of 
r occurs because sign of r in (6.4) was reversed by reversing sign of L, in (5.4)). 

Using the inverse method where the range in u and a, with @ = 180.6, a = 
.01924, is determined from the range of (X3, Y3) within the ellipse (6.4), the 
maximum and minimum obtained for g(u, a) was 


Max. g(u, a) = 490.2 at uw = 193.2, a = .01549 
Min. g(u, a) = 353.8 at u = 174.0, a = .02558. 


(6.7) 


This result does not agree closely with the previous results. The reason for 
this discrepancy may be that since the variances indicated by (5.7) are not 
exact for finite n, a variation of a from the central value predicted by (5.6) tends 
to exaggerate the departure of the distribution of X and Y from the limiting 
normal distribution through its effect upon the variances. The plausibility 
of such an explanation is strengthened by the numerical results of a solution 
of our problem by Method 4. 





W- 
ite 


ole 


ere 


1 of 


4). 


(= 


the 


for 

not 
nds 
ting 
ility 
tion 


STATISTICAL ESTIMATION FUNCTIONS 309 


Method 4. (Based on limiting distribution of maximum likelihood statistics 
fiand &, with variances estimated by taking a = & as observed from the sample.) 
In this case the unknown variances are estimated by taking a = &@ as observed 
from the sample studied. In order to avoid confusion let a» denote this value 
of a as used in the variance formulae. Thus the estimating functions X,4 and 
Y, become 


(6.8) X, = Aalti — u)/B, Ye = A(@ — a)/a 


and the approximating distribution of (X,, Y4) is taken as the same limiting 
normal distribution used in Method 3. With 


U = i = 180.6, a = a = 01924 
the extreme values of g(u, a) on the ellipse were 
Max. g(u, a) = 507.4 at u = 188.6, a = .01443 
Min. g(u, a) = 362.8 at wu = 169.7, a = .02382. 


These results agree closely with the results obtained by Methods 1 and 2. 
The confidence intervals in g(u, a) obtained were, in summary, 


Method 1 360.0 to 507.4 
Method 2 364.4 to 509.5 
Method 3 353.8 to 490.2 
Method 4 362.8 to 507.4. 


(6.9) 


From the analysis of the four methods presented above, one might recom- 
mend the following two procedures for finding the confidence interval for z 
in a problem of the above description, as practicable: 

Procedure 1. Use Method 1. 

Procedure 2. Determine the maximum likelihood estimates @ and & from 
(5.6) by trialanderror. Then use Method 4. Presumably the second procedure 
would be more open to question, especially for small values of n. 


REFERENCES 


{1] R. A. FisHer anv L. H. C. Trprert, ‘‘Limiting forms of the frequency distribution of 
the smallest and largest member of a sample,’’ Proc. Camb. Phil. Soc., Vol. 24 
(1928). 

[2] E. J. GumBEt, ‘‘The return period of flood flows,’’ Annals of Math. Stat., Vol. 12 (1941), 
pp. 163-190, formulas (27) and (28). 

[3] E. J. GumBEL, ‘‘Statistical control-curves for flood discharges,’ Trans. Am. Geophysical 
Union, Report of Section of Hydrology, 1942, p. 489 ff., formulas (6) and (16). 

[4] A. Wax, “‘The fitting of straight lines if both variables are subject to error,’? Annals 
of Math. Stat., Vol. 11 (1940), pp. 284-300, especially footnote 13 page 290 and 
statement (2) at top of page 293. 

[5] S.S. Witks, Mathematical Statistics, Princeton Univ. Press, 1943, p. 139. 

[6] B. F. Kimsauu, ‘‘Limited type of primary probability distribution applied to annual 
maximum flood flows,’’ Annals of Math. Stat., Vol. 13 (1942), p. 318-325. 

[7] J. Neyman, “‘Su un teorema concernente le considdetto statistiche sufficienti,’’ Giornale 
dell’Istituto Italiano degli Attuari, Vol. 6 (1935), p. 320 ff. 





ON FUNCTIONS OF SEQUENCES OF INDEPENDENT CHANCE VECTORS 
WITH APPLICATIONS TO THE PROBLEM OF THE 
“RANDOM WALK” IN k DIMENSIONS 


3y D. BLACKWELL AND M. A. GIRsHICcK 
Howard University and U. S. Department of Agriculture 


1. Summary. Consider a sequence {x;} of independent chance vectors in k 
dimensions with identical distributions, and a sequence of mutually exclusive 
events S,, S., --- , such that S; depends only on the first 7 vectors and 2P(S;) 
= 1. Let ¢; be a real or complex function of the first 7 vectorsin the sequence 
satisfying conditions: (1) E(g;) = 0 and (2) E(y;| Xi, ---, Xi) = ¢ forj >i. 
Let ¢ = g; and n = 7 when S; occurs. A general theorem is proved which gives 
the conditions g; must satisfy such that Hg = 0. This theorem generalizes 
some of the important results obtained by Wald for k = 1. A method is also 
given for obtaining the distribution of g and n in the problem of the “random 
walk”’ in k dimensions for the case in which the components of the vector take 
on a finite number of integral values. 


2. A basic theorem. 


2.1 Let {Xi} = {(Xui, Xo, ---, Xx:)} be a sequence of independent k-dimen- 
sional chance variables with identical distributions. Let S,, S:, S3, ---, be 
mutually exclusive events such that (1) S; depends only on Xi, X2, ---, Xj, 
and (2) 2P(S;) = 1. Let o(Xi, X2, ---, X;:) be a sequence of real or complex 
variables satisfying the following two conditions: 

Condition 1: E(g;) = 0 for all 7. 

Condition 2: E(g;| Xi, Xe, «++, Xi) = gi for all 7 > 7, where E(g;| X1, Xe, 
--+, X,) stands for the expected value of g; under the condition that X,, X2, 

--, X; are held const ant. Define gy: = gand n = 7 when the event S; occurs. 
We shall assume that H(n) is finite. 

A problem of central importance in sequential theory may be formulated as 
follows: What conditions must g; satisfy so that E(g) exists and equals zero? 
We shall prove the following: 

THEOREM 2.1. Jf there exists a function f(a, 2, -*+, Xx) > O such that (a) 

a 


E[f(X;)| is finite and (b) | gil < Z. f(Xa) when n > i, then E(g) exists and 
d=1 


equals zero. 
Before proceeding to the proof, we consider two consequences of this theorem. 


v 


I. Assume that E(X,;) = a,. Let g = >> (X,; — a,). It is easily 
j=l 
verified that ¢; satisfies conditions 1 and 2. We set f(a, t2, +--+, %) = |% 


1 Chance variables ¢; satisfying condition 2 have been extensively studied by P. Levy 
{1] and J. L. Doob [2]. 


310 





INDEPENDENT CHANCE VECTORS 311 


—a,|. Then Theorem 2.1 is applicable and we get Ep = 0. Now ¢ = W, — 


na, where W, = >> X,:. Hence we have 
t=1 


(2.11) E(W,) = a,E(n). 


The relationship (2.11) has been proved for k = 1 by Wald [3] and subse- 


quently under somewhat more generalized conditions, by one of the authors [4]. 
k ° 

II. Let t; , t2, «++, & be any real or complex numbers for which a 
is finite and |a| > 1. We assume that there exists a positive constant M 


such that 


2.12) ae a 
t=1 

when n > m. Let 

(2.13) o=a vin Dey trXeg 


so that 
k 
(2.14) ga antenna tre wut 
where W, is defined as above. It is easy to show that ¢; satisfies conditions 


land 2. Now, in view of (2.12), when n > 2 


k 


» MDP tee] DA. e-Fee et 
(2.15) | | < lal tetra | | erm ” +1<1+ Re=" 1 TrX 

k Te 
where 7; is the real part of ¢; and R = onzrn1! | 
Then, letting 


is a fixed positive constant. 


TrX v5 


k 
(2.16) f(a1, %2, +++, a) = 1 t+ Reo 
we may apply Theorem 2.1 and obtain 


(2.17) E(a7"e="= — =1 


which is a generalization of the Fundamental Identity proved by Wald [5] 
for the case k = 1. 

Proor oF THEOREM 2.1. Assume g¢; is real. Define chance variables N., 
inductively as follows: No = 0. Assuming No, ---, Nm defined, define Noi = 
Na t+ UXwytiy Xv_i2,°**). Alsolet am = Nm — Nm and ym = f(Xy,-,41) 
+ +++ + f(Xy,,). It can be shown by induction that N,,, is defined for all m 
with probability one, and that {nm}, {ym} are sequences of independent chance 
variables with identical distributions. Clearly n; = n. 

The Strong Law of Large Numbers asserts that if 2: , 22, --- are independent 


chance variables with identical distribution, then lim ss te. 


mo ™m 


c with probability one if and only if Hz, exists and equals c. 











si2 D. BLACKWELL AND M. A. GIRSHICK 


It follows that, with probability one 


(2.18) lim ee tthe = E[f(X;,)] 
and 
(2.19) ty To Me ee Bo oe Bee. 
m+ m m—-co ™ 
a le ee es = _ phan + ++: +A) 
Since a ee is a subsequence of nee 
we have with probability one, 
(2.20) lim nT ie = Elf(XyI 
so that 
(2.21) lim “+t + Ym _ BIS(XyIEC). 


Consequently, E(y) exists and equals Ef(X,)E(n). Since |g| < mn, Ely) 


exists. Also using conditions (2) and (b) which were imposed on ¢; we have 
| edp|=|Xf eap|=|% [ g: dp | 
Syt---+8¢ | j= Js; | Jark Bg | 
(2.22) =|-] gidp| = i ex dp| 
n> | | Sj 


7>¢s 


7>s 


< of ivias Dd] nap 


which approaches zero asi -—» «<. This completes the proof. 

If y; is a complex valued function, Theorem 2.1 still holds. For writing 
¢; = g; + th; then Condition 2 becomes E(g, + th, | X1, ---, Xs) = gy + th; 
when p > j. Hence 


(2.23) E(gp| X1, +++, X5) = g; 
and 
(2.24) E(hy| Xi, +++, X9) = hy 


when p > j. Since |g;| < | ¢;| and |h;| < | ¢;| and 9; satisfies condition 
(b) we may apply Theorem 2.1 and get 


(2.25) Eg = E(h) = 0. 


Hence Ky = 0. 





£(y) 
lave 


ting 
- th; 


tion 


INDEPENDENT CHANCE VECTORS 313 


3. Applications to the problem of the random walk in k dimensions’ 


3.1. A theorem concerning decision points. Let {X;} = {(Xu, «++, Xui)} 
be a sequence of k-dimensional chance vectors with identical distributions. We 
assume that Xj; (j = 1, 2, ---, k), take on a finite number of integral values 
ranging from —r; to m; inclusive, where r; and m; are positive integers. We 
remark that any distribution can be approximated to any degree of accuracy 
by the distribution of a variate whose values are integral multiples of a constant 
d, which can be taken as the unit of measurement. 

Let Pujug---u, be the probability that X; = (uw, uw, --:, ux). We define 


WW, = Zz X,; and set U; = (Wii, Wa, ---, Wis). Then {U;} represents 


j=l 

a sequence of points with integral coordinates in a k-dimensional space S, = 
{(y1, Ye, ***, Yet. Let R be an arbitrary bounded region in S;. We shall 
assume, without loss of generality, that the origin is an interior point of R. 
We now define a random variable n as the smallest subscript 7 of the sequence 
{U ;} for which W;; is either a boundary point or an exterior point of R. We set 
U, = W = (Wi, We, ---, Wx) and designate W as a decision point of R. 
Clearly, the number of decision points is finite. 

The random variables n and W ean be interpreted as follows: Consider a 
point Q which at the time ¢ = 0 is at the origin. At successive intervals of 
time t = 1, 2, ---, the point Q moves with integral components in S; the direc- 
tion and distance of the motion being determined by chance. ‘The point comes 
to rest as soon as, but not before it either reaches the boundary of R or falls 
outside of R. Let U; be the co-ordinates of the point Q at time ¢. Then n 
represents the length of time it takes Q to come to rest, and W represents a 
possible resting point.® 

We shall be concerned with the problem of finding the probability distribution 
of nand W. ‘These will obviously depend on the shape of the region R. In 
what follows we shall restrict ourselves to the class of regions R which have 
the property that the intersection of any line parallel to the axes with R is an 
open interval. In view of the fact that W has integral coordinates, we can with- 
out any loss of generality, replace this class of regions by an equivalent class 
which are bounded by simple polygonal closed surfaces whose vertices have 
integral coordinates and whose sides are parallel to the planes y; = 0. In the 
subsequent discussion we assume that the regions R are of this type. 

Let 


(3.10) Lu.b. [y:] 


(y1Y2---vEER 


* What follows is a generalization of a method previously employed by one of the authors 
[6] for the case k = 1. 

*That Q will reach a resting point eventually can be asserted with probability one. 
See A. Wald [5], Lemma 1. 





314 D. BLACKWELL AND M. A. GIRSHICK 


and 


(3.11) —b; = gl.b. [(y, ye, «++, yx) € RI 
Vi 


then a; and b; are positive integers. 

We now prove the following: 

Lemma 3.1. For the given sequence of chance vectors {X;} and the given 
region R, the number of possible decision points Nr is given by 


k k 
(3.12) Ne=[T@ +0; +7 +m; -1) -I@+ 0; - 3). 
= a= 

Proor: We shall first prove this theorem for a rectangular region R = R, 
where RF, is defined by —b; < yi < a;, (¢ = 1, 2, ---, k) and then generalize 
the proof to any region of the class specified. 

Let R2 be a closed rectangular region defined by —(bi + 7; — 1) < y; < 
(aj +m;— 1). Then Rk, > Ri. Let S = R, — R,. It is clear that every 
integral point of S is a possible decision point. Moreover, no point exterior 
to Rz is a possible decision point. For assume, for example, that there exists 
a point W = (Wi, We, ---, Wx) which is an exterior point of R.. Then at 
least one of its coordinates, say W;, has the property that W; > a; + m;-1 
or W; < —(b; + 7; — 1). But since —(b; — 1) < Wjn4 < a; — 1, it must 
follow that X;, took on a value greater than m; or less than —7r; which is con- 
trary to assumption. Now the total number of integral points contained in R, 

k 
is Il (a; + b; + 7; + m; — 1) and the total number of integral points in R, 

7=1 

‘ 

which by assumption are not decision points, is II (a; + b; — 1). Hence 
j=1 

the Lemma is proved if F# is a rectangular region. 

Now, let R be any polygonal region of the type specified and let R, be the 
corresponding rectangular region. Consider two randomly moving points Q 
and Q,, each having coordinates W, at time ¢t. Let the decision points for Q 
be defined in terms of R and the decision points of Q; in terms of R,. We shall 
prove that the number of decision points for Q and Q; are the same. 

By assumption, every line parallel to the axes intersects R in an open interval. 
Moreover R; D> R. Hence the sum of the areas of the segments which compose 
the boundary of R must equal the area of the boundary of R;. The same must 
be true for the total number of integral points on the boundaries of the two 
regions. Thus, the theorem is true for r; = m; = 1, (j = 1, 2, ---, k). We 
assume that the theorem is true for r; = r; and m; = m; and prove that it must 
hold for m, = m:, + 1 for a fixed but arbitrary u. Now it is obvious that if 
the range of X,; is increased by unity in the positive direction, the point Q 
can move an extra unit in the positive direction parallel to the y, axes. Thus, 
the total number of additional decision points that Q gains by the unit increase 
in the range of X,; is identical with the total number that Q, gains. This 
proves the theorem. 





INDEPENDENT CHANCE VECTORS 315 


It is clear that the smallest rectangular region which includes all the decision 
points of Wis R;. We now prove the following: 

THEOREM 3.1. For any polygonal region R of the class previously specified, 
and for any random sequence |X ;{ in which X; lakes on a finite number of integral 
values, the number of points in the rectangular region R2 which are not decision points 


is always equal to II (a; + b; — 1) where a; + b; are the dimensions of the 
j=1 


rectangular region R,. 
Proor: This Theorem follows from lemma 3.1 and the fact that the total 
number of integral points in R, is Il (a; +b; +r; + m;-— 1). 


7=1 
3.2. The distribution of W. Let Y(t, ---, t.) be the joint generating function 
of X,;, (u = 1, 2, ---, k), and g(4, ---, &) the joint generating function of 
W;(G = 1,2, --:, k). Then 


mi mk 


(3.21) Wtry-++, te) = Dy e+ Dy Payevug lt? - tf" 


u=—r} uU_E=—Tk 


aytm—l1 aptmk—1 


(3.22) o(h, ---,&) = Z — p> | a 
vj =—(by+r1—1) vRp=— (bE +rK—1) 
where £p,...2, 18 the probability that W = (v1, ---, vx). In terms of the gen- 
erating function y the Fundamental Identity (3.17) states that 
(3.23) Ety'---*WW(n,---,a)" = 


for all 4, ---, & for which | (4, ---, &) | > 1. Hence, it follows that for 
fy, «++, & for which W(t, ---, &) = 1, eh, ---, &) = 1. Let 


(3.24) ie «++, Op eo PK, «>>, 0) = 


and 
(3.25) gtr, +++, te) = it)... Gt fot), -+ >, &) 1). 


Then f(t; , ---, é&) is a polynomial of degree r; + m; in t; and g(t,, ---, &) is 
a polynomial of degree (a; + b; + rj + m; — 2) int;. 

We shall assume that f(t: , ---, &) is an irreducible polynomial. Then, since 
g(t; , «++, &) vanishes for all values of t, , ---, & for which f(t, , ---, 4.) vanishes, 
it follows‘ that f is a factor of g. That is 

a,+b1—1 aptbk—1 


(3.26) g(t,-*:,&) =fl,-:-,t) Do ces Dy Coynngltt >> 


8,=0 8,=0 


where the C,,,...., are unknown. Equating coefficients on both sides of 
(3.26) we get 


‘See, for example, Bécher [7], Theorem 7, Chapter 16. 











316 D. BLACKWELL AND M. A. GIRSHICK 


vi vk 
_ Y 
§ 9 —b,—1)+1---94—Og—rgtl = ) eS , Pima —s bujr;) _ + 
u,=0 uz=0 


(3.27) ‘ 
I] by .b5+7 5-1 
7=1 


where 6;; is the Kronecker delta. But by Theorem 3.1, [[j- (a; + b; — 1) 
of the é,,..., in g(4, ---, &) are zero since they correspond to values of W 
which are non-decision points. Hence [[jo1 (a; + b; — 1) terms in (3.27) 
are zero with the exception of the term £,+,,-1...5,47,-1 (corresponding to the 
non-decision point (0, 0)) which is —1. Hence, we have the required number 
of equations to solve for the unknown C’s and consequently for the é’s provided 
the determinant of the coefficients is different from zero. 

As an illustration, let R = R,, then the C’s are obtained by solving the set 
of linear equations 


v1 vk k k 
(3.28) Do--- Dd (II bujr; — — Comeyey-tg LE Rigayey-t 
u,=0 up=0 \7=1 jal 

where v; takes on all integral values from r; to a; + b; + 7; — 2 inclusive. 
3.3. The distribution of n. For any random variable U, let E,,....,U stand 

for the expected value of U under the restriction that W = (v1, v2, +++, %). 

Let gi (4, ---+, te ; 7) be the joint generating function of Wi, We, ---, Wz, 

and n. Then 


(3.31) gilts ,°** plas t) = Do- ++ Dd, bu... ff- + tf" Ba,..g f" 
uy Uk 

Let 

(3.32) Vili, +++, t&537) = Wh, bh, +++, kh) —1 


where y(t; , ---, 4) is the joint generating function of Xi; , ---, Xx; and is given 
by (3.21) and let 


(3.33) Yoel, --°,& 57) =gilh,--+,& 37) — 1. 


Then, if we fix so that | 7| < 1, we see by (3.23) that for all values of t, ---, & 
for which y vanishes, ¥. also vanishes. Let 


(3.34) fil, «°°, 357) = t +++ HW(h, ++, tes 7) 
and 
(3.35) foltry oo ytepr) = rte... BETH Welty, +++, tej 7). 


Then for r+ fixed, f; is a polynomial of degree r; + m; in t; and fo is a polynomial 
of degree a; + b; + r; + m; — 2 in t;. Since f vanishes for all values of 
t,, --+,t for which f, vanishes then if fi is irreducible, f; will be a factor of fe. 
That is fo can then be written as 











INDEPENDENT CHANCE VECTORS 317 


a,+bi1—2 apt+be—2 
(8.36) faltry>++ yte57) = aly ++ tes) Doves De days srey Gites ot. 
v= Cy 

The rest of the argument is identical with that employed in section 3.3. The 
unknowns in the present case, however, are &,...1,H»,..-0,7'» When &p,..-0,M04---0,7n 
is expanded in a power series in 7, the coefficient of +” is the probability that 
W = (uv, °**, ve) in exactly m steps. We shall, therefore, examine the validity 
of the expansion of the above function in the neighborhood of + = 0. 

Let us first consider the rectangular region R = R,. In this case the d’s 
are obtained from the equations 


v1 vk k k 
(3.37) a peri » (Ile. — a = Ly o.,;+7)—1 


oan upml \jo1 
(v5 = 7, ee le a; + b; + r; — 2), 


so that £&»,..-2,M@,-.-o,7" Will be given as a ratio of two polynomials in 7 the 
denominator of which will be the determinant of the coefficients of (3.37). 
But this determinant equals unity when + = 0. Hence the validity of the 
expansion is established for a rectangular region. 

If R is not a rectangle, the value of the determinant of the equations in d 
will still be unity. This follows from the fact that the number of non-decision 
points in R, is precisely the same as the number of non-decision points con- 
tained in R, , hence by rearranging of the equations they an be made to assume 
the form (3.37). 


REFERENCES 

[1] P. Levy, Theorie de l’ Addition des Variables Aleatoires, Paris, 1937. 

[2] J. L. Doos ‘Regularity properties of certain families of chance variables,’’ Trans. 
Amer. Math. Soc., Vot. 47 (1940), pp. 455-486. 

[3] A. WALD, ‘‘Some generalizations of the theory of cumulative sums of random variables,”’ 
Annals of Math. Stat., Vol. 16 (1945). 

[4] D. BLackwEL1, “On an equation of Wald’’, Annals of Math Stat., Vol. 17 (1946). 

[5] A. Waxp, “On cumulative sums of random variables,’”’ Annals of Math. Stat., Vol. 15, 
(1944). 

[6] M. A. Grrsuick, ‘‘Contributions to the theory of sequential analysis,’”’ Annals of Math. 
Stat., Vol. 17 (1946). 

(7] Maxime Bocuer, Introduction to Higher Algebra, Macmillan Co. 





APPROXIMATION OF THE DISTRIBUTION OF THE PRODUCT OF 
BETA VARIABLES BY A SINGLE BETA VARIABLE 


By JoHn W. Tukry AND S. 8S. WILKs 
Princeton University 
1. Introduction. In an article published elsewhere in the present issue of the 


Annals of Mathematical Statistics {1] the g-th moments of two statistical test 
criteria Lmye and L,- were found to have the following expressions, respectively: 


o(k—1) ram~—-1—-a+g)} rG@nk —- 1)) 
= en | rd@ — 1-9) | FGntke — 1) Fok = D) 


i=1 


and 


oan TF [TGm@—1—-i)+y9)] rG@m— 1k - 1) 
~ oe | rd@ — 1-9) | at ee. 


t=1 


If we denote by (a), the expression a(a + 1)(@ + 2) --- (a + g — 1) and 
make use of the fact that 


(3) (a + g) = I'(a)-(a), 


and 


(4) I(a + rg) = Ta): (arg = V(a)-7” I [ (2+ #—1) 


Q 


where r is a positive integer, the two moments (1) and (2) reduce to 


pa8 ee k-1 ml a 
G+), Wests) 


(5) —~ : and BRM 
s ,%-I1 n-1,%#-1 
(3 +44), I ( 2 +=) 
respectively. 
For any given value of 7 (¢ = 1,2, --- , k — 1) the ratio 
(3 +$—4-") re 
A ____——__“£ or : 
nmn,t-—l n—1 * 
(3 TER ] (37 +5=4), 2 
may be expressed in the form 
Tipit g) 
r(ps+ai +9) 
which is the g-th moment of a beta variable u; distributed according to 


l'(pi + qi) uP 


Ug 1 _ a li “ie 
I'(pi) TQ) ( —- 


318 





APPROXIMATION OF A DISTRIBUTION 


Each of the moments in (5) is therefore of the form 


Tr Ti+) 
mt T(pi + qi + 9) 
Thus, Lave and L,, are each distributed like the product of k — 1 independent 
beta variables. 
Each of the moments in (5) can be expressed in the general form 


(6) M, = 


2 
i. om 

Other likelihood ratio statistical test criteria which have been discussed in the 
literature have moments which can be expressed in the general form (6). For 
example, the likelihood ratio criterion L; for testing the homogeneity of sample 
variances [2, Neyman and Pearson 1931] has moments of this type. The gen- 
eralized L, criterion for samples from a normal multivariate population [3, 
Wilks 1933] has such moments. The criterion for testing sphericity [4, Mauchly 
1940] of a normal multivariate distribution has moments of this kind. All test 
criteria having this type of moment lie on the interval (0,1). The exact dis- 
tribution functions of the criteria, except possibly for r = 1 or 2 in some cases, 
are very complicated. 

The purpose of this note is to consider a method of finding a fractional power 
of the test criterion which is approximately distributed (in a sense to be described 
later) according to an incomplete beta (Pearson Type I) distribution function, 


2 
where « = —{ or i}? A; and B; are real numbers. 
n 


(7) dF(u) = ome u? (1 — u)* "du 


and to find the appropriate values of p, g, and the exponent of the criterion. 


2. Generalized hypergeometric series as moment generating functions. 
Suppose L is a statistical test criterion, or more generally a random variable 
having as its g-th moment the expression (6). The moment generating function 
g(t) of L can be expressed as 


: l € As + 1) 
(8) Qh FE aw SE... 
g=0 


ry (1 


This can be written as 


os 
(9) ¢(t) = r’4+1 F,: 
F tip 











320 JOHN W. TUKEY AND S. S. WILKS 


where ,4:F, [ ] is a generalized hypergeometric series [5, Bailey 1935]. We 
shall not make explicit use of this fact ; instead, we shall work with the coefficient 
of ¢° in the series, i.e., Mg. 
Let us consider 
(10) In M, = ymn(- —-A:+ 1) _ yna(? —~ B+ 1) 
i=l x 9 i=l zt o 


To expand this in a power series in x consider a single term 


1 a. fl 
in(h- ata = Ym(? - 4 +5) 





Q 7=1 
(11) 

= —ginz+gin(1 — Az) + yin(1 + = ). 

j=l 1 — Az 
Now 
b+ = 1+ jet ja + sata + + 
1 — Az 

Writing 


Sng) = UJ", 
and using the usual expansion for In (1 + 2), we find 
in(} — 4 +1) = -ginz + [Si@) ~ Age + BA? + ASG) — 3S" 


+ [-34° + A*Si(g) — AS(g) + 48,(9)]2* + --- 


Applying this expansion to the separate terms in (10) and writing 


(12) Cn = 2, AT — DB? 
i=l t=] 


the terms not involving A, or B; cancel out leaving 


In M, = (—Cyg)x oh [3C. ad C1S8,(g) |x" 
(13) 


+ [-—3C; + C.8i(g) — C1S2(g) Jx° de oes, 


We shall return to this expression later. 
3. Powers of a beta variable. If wu has (7) as its distribution function, then 


te (p)r 
(14) we a 


If v = u’, r integral, then its g-th moment is given by setting h = rg in (14). 





a aeeeliemeeieiaeenmimmenemeaeneiinieianeanenen eae 


' 














APPROXIMATION OF A DISTRIBUTION 321 





We have 
(P)ro 
E(v’) = ss 
( ) (p + Q)ro 
But 
(P)ro r (?) ( > ; > b 
so that 
II (2 +i- ') 
(15) se fi (2 tate ty 
see 
I =i 
which is a special case of (6) when p is of order n. 
Putting = = pre, A; =1+4(q—i+1)/r, and B; = 1 — (¢ — 1)/r 
we have 


q 1 
C, = = +a(1 +4), 
T r 


For any given moment of the form (6), from which z, C; , and C, can be com- 
puted, we determine p, q, and r so as to satisfy 


ec? : 
(16) r 2 
q=C, 
and to satisfy, as nearly as possible, (with r integral) 
F 1 
(17) +a(1t+4)=«, 
r r 
Le., 
- 74+1) 
C2—@ a 


The use of fractional 7 is obviously suggested, but its value and validity are 
not discussed here. Using the values of p, g and r thus obtained, the distribution 
of the criterion L (having moments (7)), is given approximately by 


r r(p + q) + q) p-1 r/F\a-l at /F 
18 s 
(18) (prc) (*/L)? “(1 — VL)" ' d(¥/L) 











322 JOHN W. TUKEY AND S. S. WILKS 


where the approximation is such that all moments are correct through terms of 
r r 
order mae . (when moments are expanded in series of ——— ) and nearly 


p+q 


(exactly if there is an integral value of r satisfying (17)) correct through terms of 
der ( : ) 
order {—— } . 
p+q 


4. Examples. Returning to the g-th moment of L,,,. given by the first ex- 
pression in (5) we have 





t= a r’=k- 1 
n 
k 3-12 k-—7@ 
odie ial i mens 
k—1 k-1 
ie DA: = > B: = 1(k’ + 3k — 6) 
w=1 i=l 
C= DAT SB = Ace + 2b + ayak +5) — 84] — KORA 
.” = a = , ai - 


To determine p, q and r for the fitted distribution of Ln. we set 


eo 


¥ 


bo 


q = 3(k’ + 3k — 6) 


y= 49 +1) 
C2 — q 


and solve for p,q andr. We have the following table of values, p, g and r for 
various values of k (p being calculated by using the rounded values of r): 


r 
(rounded) P q 


n—3 
1.5n — 5.5 
2n 
2.5n 
3n 
3n 
3.5n — 25.5 | 25.5 


_ 
- 
— 
~I 
~J 


' 
tS >= 
SSOOMNI Sap w 

& & m 68 0 00 4 
bo in & © 

cm Go Oh 
Araurwn 

| 
De 
SON 

or 

| 

| 

| 

NOE = 
SOU ene | 
or | 
ee eR rena re eee earn eeeeeereeeeereernreereeeenerenreennneneaee ener NT 








rr 


APPROXIMATION OF A DISTRIBUTION 323 


Thus, by rounding r off to the nearest integer and using this rounded value of r 
in determining p, we have values of p, g and r for each value of k, which, when 
substituted in (18) give us the desired fitted beta distribution for Lmo.. For 
k = 3, the fitted distribution is the exact distribution. 

For the g-th moment of L,. which is given by the second expression in (5), 





it is convenient to expand in powers of Hence we have 








n—1° 
4 2 ‘ r=k- 1 
n—1 

_k+2-i _k—-i 
——o ee 
Ci =3(h +k — 4) 

k(2k — 1 

C2 = vel(k + 1)(b + 2)(2k + 3) — 30) — Wey 


To determine p, q and r for fitting the distribution function of L,, we put 


e+¢ a-i 
r 2 








q= i(k +k—-4) 


q(q + 1) 
C2—q- 





We have the following table of values of p, g and r for several values of k: 


k r , 








(rounded) P q 
3 2 2 n-—3 2 
4 2.88 3 1.5n — 5.5 4 
5 3.71 4 2n — 8.5 6.5 
6 4.52 5 2.5n — 12 9.5 
7 5.32 5 2.5n — 15.5 13 
8 6.14 6 3n — 20 17 
g 6.88 7 3.5n — 25 21.5 
10 | 7.82 8 4n — 30.5 | 26.5 
20 | 15.26 15 7.5n — 111.5 104 


By rounding r off to the nearest integer, and using the rounded value of r in 
determining p, we have values of p, g and r for each value of k which, when sub- 
stituted in (18), give us the desired fitted beta distribution for L,.. For k = 3, 
the fitted distribution is the exact distribution. 











324 JOHN W. TUKEY AND S. S. WILKS 


For a given value of k, approximate 5% and 1% points of ~/Z,,- and VL, 
can therefore be obtained from Thompson’s [6] tables of the Incomplete Beta 
Function by entering the tables with », = 2q, and », = 2p. For example, for 
k = 6 the 5% and 1% points of ~W/Lnye are obtained by entering Thompson’s 
tables with », = 24, and v». = 5n — 24. 


REFERENCES 


{1] S.S. Wixks, “Sample criteria for testing equality of means, equality of variances and 
equality of covariances in a normal multivariate population,’? Annals of Math. 
Stat. Vol. 17 (1946) pp. 257-281. 

[2] J. Neyman and E. 8S. Pearson, ‘‘On the problem of k samples,’’ Bulletin de l’ Académie 
Polonaise des Sciences et des Lettres, Série A, Sciences Mathématiques, 1930 and 
1931, pp. 460-481. 

[3] S. S. Wiiks, ‘“‘Certain generalizations in the analysis of variance,’’ Biometrika, Vol. 24 
(1932), pp. 471-494. 

[4] Joun W. Maucuty, “Significance test for sphericity of a normal n-variate distribution,” 
Annals of Math. Stat., Vol. 11 (1940), pp. 204-209. 

[5] W. N. Barer, Generalized Hypergeometric Series, Cambridge Tracts in Mathematics 
and Mathematical Physics, No. 32, Cambridge University Press, 1935. 

(6] CaTHERINE M. Tuompson, ‘‘Tables of percentage points of the Incomplete Beta Func- 
tion,’’ Biometrika, Vol. 32, Part II (1941), pp. 187-191. 


ENTE 


| 








SOME FUNDAMENTAL CURVES FOR THE 
SOLUTION OF SAMPLING PROBLEMS 


By Epwarp C. Mo.uina 


East Orange, N. J. 


1.Summary. In using collateral information in an inverse probability situa- 
tion to estimate a population fraction from a sample fraction it is necessary to 
use some particular form for the a priori probability function. This paper points 
out the advantages of using K2’(1 — 2x)* for this purpose. The application 
then involves only the Incomplete Beta Function. 

Graphs of the 10, 25, 50, 75 and 90 per cent points of the Incomplete Beta 
Function are given. They cover a range which includes and extends previous 
tabulations. 


2. Introduction. The engineer, scientist or industrialist is often confronted 
with the following “sampling” problem: 

“The probability, p, of an event happening in a single trial is constant from 
trial to trial, but the numerical value of this constant is unknown. A series 
of n trials is made and the event happens ¢ times, c < n. What light does 
this statistical data shed on the unknown value of p?”’ 

As a concrete example, suppose that a new type of brakes is proposed for a 
given class of steam locomotives making the run from Buffalo to Detroit.’ 
Let each of 30 locomotives be equipped with a set of the new brakes and given a 
trial run. Of these, 26 make satisfactory runs, so far as the behavior of the 
brakes is concerned; the remaining four encounter difficulties. Here, the event 
of interest is a satisfactory run, n = 30 and c = 26. What ‘weight’ (confi- 
dence’) may the design engineer assign to the assumption that, say,25/30 < 
p < 27/30? 

Practical decisions involving such statistical data are usually based on a com- 
bination of the data with ‘“‘collateral” information. In fact, the applied statis- 
tician is all too familiar with the extreme case where the statistical data are so 
meagre as to provide no information and where a decision must be made now— 
in these cases the decision is made solely on the basis of the collateral informa- 
tion, and rightly so. 

The methods of statistical analysis and presentation developed up to the pres- 
ent have concentrated on the other extreme case, where the statistical data are 
so good that collateral information can be neglected. 


1 This fictitious example convicts the writer of total ignorance of railroad engineering. 
Nevertheless, the illustration brings out, in concrete terms, the class of sampling problems 
under consideration. 

? The purely intuitive meaning to be attached to ‘“‘weight”’ and ‘“‘confidence”’ is the same. 
However, the curves presented with this paper are not based on the theory which underlies 
what are known, in statistical literature, as ‘“‘confidence intervals’’. 


325 











326 EDWARD C. MOLINA 


There is a real need for methods of analysis and presentation to be used where 
both the statistical data and the collateral information should be used. How- 
ever, when the significance of the collateral information is adequately expressed 
by a function w(x), x being a permissible value of the unknown 9, the classic 
Bayes-Laplace theory (see [1]) of inverse probability gives the solution to a 
sampling problem. 

The purpose of this paper is to present a set of sampling curves based on a 
w(x) function whose form embodies some important properties.® 


3. Hardy’s collateral frequency function. Consider again the locomotive 
brakes problem. The new design may have been carefully engineered, in ac- 
cordance with well-known principles, to reduce costs at the expense of a slight 
reduction in reliability of operation. In such a situation, the collateral informa- 
tion would be somewhat as follows: There is a high “probability” that the un- 
known value of p is a little below the known value for the old type of brakes. 
Moreover, it may be assumed that the “probability”? drops rapidly for values 
of p departing materially from this old value. Suppose the latter is p = .95; 
then the collateral information would be presented by some such curve as num- 
ber 5 in Figure 1, the mode (peak) of this curve being at .90, which is slightly 
below the old .95 value. 

Number 5, of Figure 1, belongs to the family of curves corresponding to the 
frequency function 


w(x) = Kx7(1 — x)* 


This form for w(x) was suggested, in 1889, by the British actuary Sir George 
F. Hardy (see [2]) for the construction of mortality tables. Its mode, mean 
and variance are given by the equations 


Mode r/(r + s) 
Mean (r + 1)/(r + s + 2) 
Variance = (r + 1)(s + 1)/(7r +s + 2) (r + 8s + 3) 


G. J. Lidstone (see [3]) has pointed out that the Hardy form for w(x) has two 
important advantages: First—‘‘By suitable choice of r and s any required values 
of the mode or mean and the variance of z, can be reproduced, and thus a great 
variety of distributions may be approximately represented.’ Lidstone’s 
zz is our w(x). Second—‘‘The factors x” and (1 — 2)* unite in the simplest 
and most elegant way with similar factors in the Laplacian integrand... ”. 


3’ Many statisticians, including a referee of this paper, feel that it is a common situation 
to have the collateral information so vague and elusive that it is virtually impossible to 
take it into account via inverse probability. (The author doubts this.) Such statisticians 
may wish to use the Clopper-Pearson confidence intervals, using no collateral information, 
in which case these curves can be used as indicated by Scheffé (‘‘Note on the use of the 
tables of percentage points of the incomplete beta function to calculate small sample 
confidence intervals for a binomial p’’, Biometrika, August, 1944). 








ze 
in 


VO 
les 
at 
e's 
est 


10D 
: to 
ans 
on, 
the 
ple 


SOME FUNDAMENTAL CURVES 327 


From this second advantage there follows a third which will be presented in 
section 6 below. 


4, Theory. The Bayes-Laplacian formula gives us 


x 1 
(1) P(p S X) = [ w(x)a“(l — x)" ax | [ w(x)x°(1 — x)” * dx 
0 0 
for the “‘a posteriori probability” that p < X. In this formula, the product 


Fig. 1 
Particular forms of the a priori (collateral information) function: : 


w(x) = K 2t (1 — x)* 


“CLT LF DRATSN 













| |e 
1 Yel ANI 






, 
UT ZT 
WY 


PAs | | 











- 20 
z:—_> 
Curve r s Form 
] 0 0 K 
2 } 1 Kzri(l — z)t 
3 1 1 Kz(l — zx) 
4 2 1 K2z?(1 — z) 
5 9 1 Kz*(1 — z) 








z(1 — x)"~* takes care of the fact that the event happened c¢ times in the n 
trials; the factor w(x) represents, quantitatively, the collateral information. 
Adopting, now, Hardy’s frequency function, we assume that 


(2) w(x) = Kar(1 — x)8, 











328 EDWARD C. MOLINA 


r and s being assigned values in accordance with the collateral information 
pertaining to the particular problem under consideration. Theoretically, the 
constant K should be such that 


[ w(x) dx = 1, 


but, since w(x) enters in both numerator and denominator of (1), any desirable 
value may be given to K. Advantage has been taken of this in constructing 
Figure 1; to facilitate comparison of the five curves shown therein, for each 
curve K is such that the maximum ordinate is equal to 1. 

The second advantage, pointed out by Lidstone, of the form adopted in this 
paper for the function w(x) becomes apparent immediately on substitution of 
(2) in (1). We obtain 


x 1 
(3) P(p < X) -[ a°(1 — 2)” ° ax / | a°(1 — x)" dz 


withC =c+randN =n+r+s. Therefore, a single family of fundamental 
curves, plotted with reference to C and N, will give the solutions for a multitude 
of different practical problems. To solve a particular problem, for which the 
values of n, c, r and s are specified, we merely enter the curves with C = c+r 
and N = n+r-+s. These linear relations transform all a posteriori curves, 
published on the assumption that w(x) is a constant, into fundamental curves; 
namely, that they are applicable with the more general form (2). For example: 
The information given on the sheets of inverse curves (inserted in the back cover 
pocket) of Col. Leslie E. Simon’s Engineer’s Manual of Statistical Methods in- 
cludes the restriction “that prior to sampling, one lot fraction defective is as 
likely as another’”’. It is now obvious that the use of Col. Simon’s curves is 
not so limited; his curves may be used in any situation wherein the available 
collateral information is covered by the assumption that w(z) has the Hardy 
form. Likewise, the ‘‘Weight = .98” and “Weight = .8” curves (“‘confidence’”’, 
in the intuitive sense), presented by R. P. Crowell and the writer in their paper 
now have a much wider range of applicability. 


5. Curves. The ratio of definite integrals in equation (3) is tabulated, in a 
different notation, in ‘““Tables of the Incomplete Beta Functions’’, edited by 
Karl Pearson. 


This paper Pearson Tables Thompson Tables (see [5]) 


C p—l (ve — 2)/2 
N-C q-1 (vu, — 2)/2 
X x tabulated value 
P(p < X) tabulated value caption to Table 


The range of values of C and (N — C) covered by the Pearson Tables is indi- 
cated by the shaded area in Figure 7. For curve points falling outside this 


2 
t 





t 





§ 
HY 


$$ i I ER scene 


= 


eo rman 


es 





SOME FUNDAMENTAL CURVES 329 
range (except for C = | and 2, found from the binomial summation by trial 
and error) recourse was had to a series developed by the writer for the solution 


of some problems confronting him, as Switching Theory Engineer, in the Bell 
Telephone Laboratories. Many points of the C = 1, 2, 3, 4, 5, 6, 7, 8,9, 10, 12 


P(pép, ) = 225 


2999 sclcttietadiiatianangtaecadiiaasiinamama’a cicntanienadiaatentaeendanasindtaaanne 
ake 


| 
| 
a, nema z AT: a ee teed 
-—+—+ + fi ere fe _ va 
erenreneteinnceneeraneranaanay sewn 


“Fie ule le Ae 
“de ae a sf 


c a SAY : 
: SSNS x 


SSS 





ONDE 


$ § 8 
| 
Ly 
7 
yy 
L 
V A, 
My) My 
HN HHE 
We 
hy 
ALLE 


/ 
| 
j 





—] 
- CPR 


Hi 
HH 





and 14 curves can be obtained direetly from the Thompson Tables. They do 
not, however, give any points for the C = 16, 18, 20, 25, 30, 40, 45 and 50 curves. 
It may be added that, except for certain marginal values, the Thompson Tables 
were also derived from the Pearson Tables. 








330 










-_ HN MQ QQ SWSNSSSSS 
oe et ASRS 
=| SESSA 
| _tEALIEEAK SSS 
See SSS 
so) ——} HTS SSS 
SS 
“COO SE RESS 
pentane enon never in gee genera a eS 
.020 mane errs 
010 ] 


EDWARD C. MOLINA 


greener RTT UMN LTT 


RATE 


SS RQ 
SEE QA a iar AN 













VARA AN AAAAR TT 










20 40 60 80 100 


Five sets of fundamental curves are submitted, namely, 


Figure 2, 
3g 

4, 

5, 

6, 


P(p < X) 


.25, 
a 
. 10, 
90, 
. 00, 


X =p 
X = pr 
Ah 
X = pr 
X = po 


It will be noted that p; has been written instead of X for the curves such that 


SOME FUNDAMENTAL CURVES 331 


P (p < X) isless than .50; likewise, p2 for X for those corresponding to P (p< X) 
greater than .50; po for X for the P (p < X) = .50 curves. 


P(pé P, )=.10 







Ha 
|_| 


PAN A No leleleil | 
| \ \ | > | | | 


= Ky a 
| Nh, SESS SRR 














NON 

a PASE NSS NSS 
 esieetieenieal ea nT | ASS SESS SSSA 
SCRE SSS 

-" eat SSS 
RS SSSSSSS 

en s emedfnenalleesdhncdhnalhcoliail tieasesennanandiee tinea nesses saa 
aa caegelbennnlh mind ienihooeli ond ciindi onsen tlagglionennnd nih en 

4020 jnonnaialbetd »_— rf = 
a0 eho ele 
. i encanta deere ettineenicsediniedecsked ei 
Te ee eel en 
i censcenetcienenshiaialicanieasesidanadidet heeeiatalieg lade ctbinimamsineanunnlieiainesiidgeaalna 
a esas pel cst dail clinic cael cela occa 
001 ire oman te ent aes See eens eee 
Oo 80 100 


For each pair of values of C and N, the curves of Figures 2 and 3 give the range 
P(~pi <p S< pr) = .50 
Whereas, the curves of Figures 4 and 5 give the range 
P(p <p S m2) = 80 


As an example of the applicability of the fundamental curves, let us reconsider 


nat the locomotive problem for which n = 30 and c = 26. It was suggested that 











332 EDWARD C. MOLINA 


P(pép,) = 90 











999 ae Ee enna 
998 |-— + + ES 
scam L _— 4 —." onicenenseenesaiemineeenn 
a a =e nes 
990 --— omer tf 4 | pt 4—} 4+} — 
LL | | 









BO entree mont ea 
| | | \a Wo 
71 SAN 
oa eee se | v\ 
| } } | 
800 | \A 
1 Ne | 1 \| 
2700 —e . d 
BNA 


XN 
IN AS NIA 
SSSSGRESSAN 
ra ona ted | SSS 
= SEO 
HPRSESSSSSSS 


8 
/ 


ZL 


" 
WM 
WM 


“eee SSSSSSSSOS 
Mo Pe SSS 

SS LSS 

=P SSS 

i NESSES 
Bee 

on pogo | PS 


“ Et 
bE ah dnt Ea 


2005 i eaesiedinceiactilics tele 
poorer eneeeinmeene emma iod merce eetanent i 
ane $j} —_—_4+—_+}+-+-1 11 oe UE 
ee ee ee eee 
002 + ome ooo EEE 


2 4 6 8 10 20 4O 60 80 100 


5-— 


Fia. 5 


the r = 9, s = 1 curve of Figure 1 might well represent the collateral information 
available. Therefore we take VN = 30+ 9+ 1 = 40 andC = 26+ 9 = 35. 
Entering Figures 2, 3, 4 and 5 with this data we find 














Fig. P(p < pr) p1 | Fig. | P(p < pr») | p2 
bait peas | _ 

2 25 83 / 3 | 5 | .89 

4 10 | .79 | ) 5 | .90 | .92 





SOME FUNDAMENTAL CURVES 333 


P(pép.) = 250 














oe cae 


wh CK AA LL 


Na Vs 
CTI RCCL 
SNS RRS We 


Welcoen NCNINSS SS 


i [i oo 
. 7 e e 
on . 
wo 
°o 





_— 
o 


NAW NAN 
SRSUN SSS 
c 
“ SSS 
: otto Lt | SSS SSS 
i FRESH SES SSS 
Pi SSss 
50 Sel el eer eee 
ated A cisnamomeanenstllienieessetiliecennalianetiinainiea aan 
Sill ccninssisionadlenmenitteteadiailsaal ee 
a) a Sn 
005 [—— fea ani oe Bh 
Sac ee nd eens 
sina l canjuiessanuialiaintinhimaniaediesniia Uiimecmiannell i nctelliienilliendias cdi 
7 ne 
001 pangs eee Ea 
2 6 8 210 20 40 60 80 100 
100 1 — 
Fic. 6 
Thus we have, for the unknown probability of a successful run with a new set 
; of brakes, 
tion 83 <p < .89, with weight .50 
35. omit 
anc 
.79 <p < .92, with weight .80 
6. Sequential property of the curves. The original draft of this paper was 
— | submitted to Dr. W. V. Houston‘ in connection with the solution of a problem 
9 | sssctenncnentinemimenenaee 
2 ‘ Of the California Institute of Technology and now President of Rice Institute, Hous- 


ton, Texas. It was Dr. Houston who gave the impetus to the publication of this paper. 











334 EDWARD C. MOLINA 


, 


“Tables of The Incomplete Beta-Function,’ 
evaluation of 


edited by Karl Pearson, can be used for 


" 

[ z°(1 — x)*-¢ dz 
oo —————__—_—_—- 
[ 2°(1 — 2)" dr 
0 


only when values of (V — C) andCarein BXsy. 





in which he was interested. Regarding equation (3), Dr. Houston made a very 
significant comment, the burden of which may be stated as follows: Suppose 
that before the series of n trials had been made, it was known that, at some 
earlier time, a series of r + s trials had resulted in r successful outcomes. Sup- 
pose, moreover, that the collateral information called for the assumption that, 
a priori, all values of p were equally likely. Under these circumstances equation 
(3), derived by substitution of (2) in (1), gives P(p < X) for two consecutive 
series of trials, one of r + s with r successes followed by another of n with ¢ 
successes. An immediate generalization of Dr. Houston’s thought shows that 
the fundamental curves may be entered with 


tern rr 





SOME FUNDAMENTAL CURVES 335 


N=emtmt- tate tmtrt+s, 
Ceatat+>--+at---+a.4¢7, 


for the solution of a problem involving m consecutive series of trials, n; and c; 
being the number of trials and successes, respectively, in the ith series; the in- 
troduction of r and s removing the restriction that all values of p were a priori 
equally likely. 


REFERENCES 


[1] T. C. Fry, “A mathematical theory of rational inference’’, Scripta Mathematica, Vol. 
2 (1934). 

[2] G. F. Harpy, Trans. Faculty of Actuaries, Vol. 8 (1920), p. 181. 

[3] G. J. Lipstone, ‘‘Laplace’s antecedent—probability function’, Math. Gazette, Vol. 25 
(1941), p. 162. . 

[4] R. P. CROWELL anv E. C. Mo ina, “Deviation of random samples from average con- 
ditions and significance to traffic men’’, Bell System Tech. Jour., Vol. 3 (1924). 

[5] CATHERINE M. Tuompson, ‘Tables of percentage points of the incomplete beta-func- 
tion’”’, Biometrika, Vol. 32 (1941). 





ENLARGEMENT METHODS FOR COMPUTING THE INVERSE MATRIX 


By Louis GUTTMAN 
Cornell University 


1. Summary. The enlargement principle provides techniques for inverting 
any nonsingular matrix by building the inverse upon the inverses of successively 
larger submatrices. The computing routines are relatively easily learned since 
they are repetitive. Three different enlargement routines are outlined: first- 
order, second-order, and geometric. None of the procedures requires much more 
work than is involved in squaring the matrix. 


2. Introduction. A set of methods is presented here for computing the in- 
verse matrix, based on what we shall call an enlargement principle. The prinei- 
ple is to build the inverse upon the inverses of successively larger submatrices. 
This leads to simple repetitive routines that are not unlike iterative steps, but 
afford a direct solution. 

The basis for such routines has also been noticed before,’ but does not seem to 
have attracted the attention it merits. A possible reason for this lack of atten- 
tion may be the belief that the methods apply only to a restricted class of mat- 
rices. We establish a simple lemma in this paper which shows that the enlarge- 
ment methods apply to all nonsingular matrices, so that their use is perfectly 
general. 

The enlargement principle may be considered an opposite of the ‘“condensa- 
tion” principle that governs Gauss’ method of elimination and its variants such 
as the Doolittle procedure and Aitken’s ‘‘pivotal condensation.’ It is interest- 
ing that the same formula upon which the enlargement methods are based can 
also serve as a foundation for the condensation methods, as is shown in section 
7 below. 

The enlargement methods have the following characteristics: 

(1) The first-order procedure outlined in the next section has been learned 
by statistical clerks in about ten minutes. People who calculate inverses only 
occasionally and forget the process between times should find the method as 
economical as those who must constantly compute inverses. 

(2) They are direct methods, and yield an exact answer with not much more 
work than is involved in squaring the matrix. 

(3) They can be adapted to electric punch-card systems, whieh will be effi- 
cient when very large matrices are to be inverted. 

1 It has appeared earlier in [2]. Waugh’s recent note [10] also rediscovers the basic for- 
mula although only a specialized use is suggested there. Professor Harold Hotelling has 
called my attention to reference [1], which overlaps substantially with the present paper, 
and to a use of an enlargement approach to computing latent roots and vectors [9]. Iam 
also indebted to Professor Hotelling for other helpful comments on the present paper. 


2 For an excellent summary and bibliography of direct and iterative methods for com- 
puting the inverse matrix see ([5], [6]). 


336 





COMPUTING THE INVERSE MATRIX 337 


(4) A sequence of inverses is yielded. Exact inverses of successively larger 
submatrices are computed in the routines, and these inverses are often them- 
selves of interest. For correlation problems, this means that a sequence of sets 
of successively higher order multiple correlation constants is produced routinely. 

(5) The general formula upon which the methods are based allows many varia- 
tions in procedure, so that special adaptations can be easily made for special 
matrices. 

A “first-order”? enlargement. procedure for computing the inverse matrix will 
be outlined in the next section. The proof for the method follows from the gen- 
eral formula in section 4. This procedure and formula are also described in 
[2]. Other enlargement routines are described in subsequent sections. Some 
additional formulas of relevance are discussed in section 8. 


’ 


3. First-order enlargement. Let the matrix whose inverse is desired be 
| Qi Are 


| 
A, = ] Gx 2 


| 


| Ani One 
The following sequence of successively larger principal submatrices will be as- 


sumed to be nonsingular: 


| au Qy2 3 || 


| 
| 
> ~—— Gen Meg ||, °° » An. 


G3, Ase 33 | 


If necessary, the rows and columns of A, can always be shifted to obtain such a 
sequence. The following additional notation will be used: 


By = (1 ,5412,i41 °° * Oee41) 
Cy = (Gi41,1 Giga.e ++ * Os41,s) 


ds = Qi41,s41. 
Thus, we can write 
[As B; | . 
Agi = || ll » (¢ = 2,3, -*°,n—1). 
ICs ds | 


The first-order enlargement procedure is to compute in turn Az’, Ay’, --- , Az. 
The inverse of Az is computed by the traditional steps: 
{1} Compute A = ay:@22 — d2112 , and compute 1/A. 
{2} Then 
| Sa 


at onl 
|| ~A an A ay 





338 LOUIS GUTTMAN 


Remember that Bz = (@13 @23), C2 = (agi G32), and that dz = a3;. The steps 
for computing A;° are as follows: 

{3} Compute E, = Az'Be. 

{4} Compute fz = dz — C2R2. 

{5} Compute 1/fe. 

{6} Compute G, = fz'E,, and compute H2 = f2'C2Az'. 

{7} To each element in Az’ add the product of the corresponding elements 
in E, and He to form Ke = i a EH, ‘ 
Then the third order inverse is 


K, —G, 
A; = | 

—H, 1/fe 

In general, to obtain Ajj: from Aj", (¢ = 2, 3,---,n — 1), imitate’ steps 
{3} through {7}: 

{3’} Compute E; = A7'B,. 
{4’} Compute f; = d; — CiE;, 
{5’} Compute f;". 
{6’} Compute G; = f;'E; , and compute H; = f;'C;A7". 
{7’} Compute K; = Aj’ + E;H;. Then 


, 


- K; —G; 
Avi = . 


—H; 1/f; | 


By repeated applications of steps {3’} through {7’} to the successively larger 
A;", A; is attained. 

If A, is symmetric, then almost half the work is saved, for then B; = C;, 
G; = H;, and K; is symmetric, (¢ = 2,3, ---,n — 1). 

To help gauge the amount of work needed to arrive at A;', let us compare it 
with the work that would be needed to square A,. For the general asymmetric 
case, n? product sums of n terms each are required for A%, , a total of n* multipli- 
cations. With calculating machines, the sums of the products are accumulated, 
so that no separate process of addition is involved. To reach A;’ by the above 
enlargement method, n* — n multiplications are required. Most of the addition 
is accomplished in the process by accumulative multiplication, but an additional 
= 5 oie at + n — 3 terms have to be added otherwise. Furthermore, 
n — 1 reciprocal numbers are needed. Thus, A; involves somewhat less multi- 


° ° 2 hikes ° 
plications than does A; , but needs more additions, as well as some reciprocal 
numbers. 


3 Actually, these steps could be used immediately in place of steps {1} and {2} to com- 
pute Az!, by lettingz = 1, and letting A; = a; (which may be assumed different from zero). 
The traditional method, however, is quicker for the 2x2 matrix. 





COMPUTING THE INVERSE MATRIX 339 


In linear multiple correlation problems, if A;+1 is the correlation matrix of the 
first 2 + 1 variates, then £; consists of the regression coefficients of the first 7 
variates for predicting the (i + 1)th variate, and f; is the square of the multiple 
correlation coefficient for this regression. 


4. A lemma and the general formula. The enlargement procedure just out- 
lined is one of many possible routines which can be developed from a general 
formula for the inverse matrix in partitioned form. This formula seems to have 
appeared first in [2], where it is stated that the method applies only to the cases 
where f; ¥ 0 in step {4}. We shall establish here a lemma that shows that this 
is no restriction, for the submatrix in step {4} is always nonsingular. Our lemma 
proves that the enlargement methods will invert any nonsingular matrix. 

Let A, be a nonsingular matrix of order n, partitioned in the form 


A B'| 
le bl) 


(1) A, = 


where A is of order m, (1 < m < n), and will be assumed nonsingular. B and 
C are of nm — mrows and m columns, and D is of order n — m. 

The following lemma is needed to show that enlargement methods will invert 
any nonsingular matrix: 

Lemma. Jf in (1), both A, and A are nonsingular, then the matrix 


(2) F = D — CAB’ 


is nonsingular. 
For the proof, postmultiply the first submatrie column of A, by A~'B’ and 
subtract from the second, leaving 
| A O || 
M = | 
le Fl 
M differs from A, only by an elementary transformation; hence its rank is that 
of A,. But clearly the rank of M is the sum of the ranks of A and F. There- 
fore, the rank of F isn — m, and F is nonsingular. 
The inversion formula itself is the following identity: 


[A BY || A‘ +A BF°CA™ —A“BF" | 


. iC D | 1 —F*cA“ ane 
A direct verification that the identity holds can be obtained by multiplying the 
right member in either direction by the right member of (1), yielding the unit 
matrix. 

In section 3, the formula exhibited for Ajj: at step {7’} is easily identified 
as a special case of formula (3) where n = 7 + 1, m = 7. F corresponds to f; , 
which is a scalar number; hence F- is easily computed in this case. 











340 LOUIS GUTTMAN 


5. Second order enlargement. In formula (3), once A is given, the rest of 
the work is essentially straightforward matric multiplication, except for com. 
puting /’-'. In section 3, F was easily inverted since it was of order unity. F 
‘an also be easily inverted if it is of order two, so that a second order enlargement 
procedure is feasible, computing Aji2 from Az’. The steps are similar to thoge 
in section 3 but involve larger matrices. 

Letting A; have the same meaning as in section 3, define now B; , C;, and D; 
according to the partitioning 


Agi. = | 


| 
Then B; and C; are of two rows and z columns, and D; is of order two. Compute 
Az’ asin section 3. From then on, to compute A742 from A; , the steps are: 
{3”} Compute E; = A7'B;. 
{4”} Compute F; = D; — C;E;. 
{5”} Compute F;' by steps [1] and [2] of section 3. 
{6”} Compute G; = F;'E;, and compute H; = F;*C.A5". 
{7”} Compute K; = Aj’ + E;H;. 
Then 


K; —G; 
—H,; F;' 


1 
Ais: = 


If n is even, successive enlargements will lead A;’. If n is odd, then Az 1 is 
attained, from which A;’ can be computed according to section 3. 

The number of multiplications and additions for this procedure is the same as 
for section 2. However, less writing is involved since only about half as many 
A; are inverted. <A disadvantage is that it is more complicated at each stage 
than is the procedure of section 3. 


6. Geometric enlargement. Another routine is that which may be called 
geometric enlargement. Here, Az; is computed from Aj’. The steps may be 
described as follows. Letting A; have the same meaning as previously, redefine 
B;, C;, and D; according to the partitioning 


, 


A; B; 
C; D; 


} | 
Ax = | l| » 
| | 


Then B;, C;, and D; are all, like A; , square matrices of order i. Compute A?’ 
according to steps {1} and {2}, and compute Aj’ according to steps {3”} through 


{7”}. From then on, to compute A2; from Aj’, the steps are formally the same 
as before, with a complication in step {5’’’}: 





st of 
om- 


nent 
hose 


d D; 


pute 


a x 
n—1 18 


ne as 
Many 
stage 


a 
fe A: 


rough 
. Same 





COMPUTING THE INVERSE MATRIX 341 


(3) Compute EZ; = are... 

‘4’") Compute F; = D; — C.E;. 

{5’”} Compute IF’; by geomctric enlargement in the same way as Aj’. 
(6’"} Compute G; = F;7'E; , and compute H; = F7'C, Az. 

(7) Compute K; = Ay’ + Eili;. 

Then, ? 


al 


| Ki —G | 
~ tei ei 


This method involves less writing than the others, but is more complicated. 


7. Condensation methods; special cases. Formula (3) also affords a basis for 
condensation methods by “‘back solution.’’ For example, let A be of order m, 
where m is one or two so that A is easily inverted. Then F is of order n — m, 
and we will denote it by Fz-m. Partition Ff, into the form 

P | A” B’ } 
mm ~~ | 
| Cc D” | 
where A® is again of order m, defining F’,-2m of order n — 2m. Continue the 
process until an F; is reached which is easily inverted, and solve backwards to reach 
Frm, and then A;', by repeated use of (3). 

Formula (3) is of great help in those special cases where A is large but easily 
inverted, such as a diagonal matrix, orthogonal matrix, ete. The labor can then 
be focussed on inverting an F which is much smaller than A, . 


} 


8. Further identities. It is of some interest to exhibit some matric identities 
relevant to formula (3). Using the notation of section 4, let us seek the inverse of 
A, partitioned in the form 








(wi xl 
(4) Az! = | |. 
1 Y Z | 
An equation to be satisfied is 
|w x’ |A BI jz oj 
ly ziictie bilo cll 
which yields the equations 
(5) WA + X'C=I1 
(6) WB' + X'D =0 
(7) YA+ ZC =0 
(8) YB’ + ZD = 1. 


ba] 





342 LOUIS GUTTMAN 


If A and D are nonsingular, then from (6) and (7), 
(9) X’ = —WB’'D—, Y = —ZCA-. 
Using (9) in (5) and (8), and remembering the lemma of section 4, we obtain 
(10) «(A ~ SOL, Z = (D — CAB’)-. 
Using (10) in (9) yields 
(11) X’ = -—(A — B’D"C)"B’D—, Y = —(D — CAB’)CA-, 


Putting (10) and (11) into (4) completes the formula 


ion | A B’||* (A — B’D"C)" —(A — B’‘D"C)*B'D" | 
12) | | = || ll . 
lc D | —(D — CA“ B’)"*CA™ (D—CA*B’)” } 


Comparing (3) with (12), we have the identities 
(13) (A — B'D“*C) = A + AB'(D — CA“B’)CA“ 
(14) (A — BD C)“B'’D— = A“B(D — CAB’), 


which may of course be verified by direct simplification. 

An important feature of each of these identities is that the matrix in parentheses 
on the left is of order m, while that in parentheses on the right is of order n — m. 

A special case of (13) was noticed by the writer [3], [4] and of (14) by Leder- 
mann ({7], [8]) and the writer ([3], [4]), in connection with regression problems 
of factor analysis. In this special case, A is a diagonal matrix and hence easily 
inverted; » — m is the number of common factors, which is usually small com- 
pared with m; the correlation matrix of m observed variates is given factored 
into the form A — B’D-C; and the work of inverting the correlation matrix of 
order m is simplified essentially into inverting a much smaller matrix. 

It should be noticed that (12), (13), and (14) assume that both A and D are 
nonsingular, where (3) assumes only that A is nonsingular (since then F must be 
nonsingular from the lemma of section 4). 


REFERENCES 

(1] W. J. Duncan, “Some devices for the solution of large sets of simultaneous linear equa- 
tions,’’ Phil. Mag., Vol. 35 (1944), pp. 660-670. 

[2] R. A. Frazer, W. J. Duncan, ano R. A. Conuar, Elementary Matrices, Cambridge 
Univ. Press, 1938. 

{3] Lours Gurrman, ‘‘Multiple rectilinear prediction and the resolution into components,” 
Psychometrika, Vol. 5 (1940), pp. 75-99. 

[4] Lours GuTTMAN AND JozEF CoHEN, ‘‘Multiple rectilinear prediction and the resolu- 
tion into components II,’”’ Psychometrika, Vol. 8 (1943), pp. 169-183. 

(5) Harotp Hore una, “Some new methods in matrix caleulation,’? Annals of Math. 
Stat., Vol. 14 (1943), pp. 1-34. 





COMPUTING THE INVERSE MATRIX 343 


(6] Harotp Hore.iine, ‘‘Further points on matrix calculation and simultaneous equa- 
tions,’”’ Annals. of Math. Stat., Vol. 14 (1943), pp. 440-441. 

[7] WaLTER LEDERMANN, ‘‘A shortened method for the estimation of mental factors by 
regression,’’ Nature, Vol. 141 (1938), p. 246. 

[8] WatTER LEDERMANN, ‘‘On a shortened method for the estimation of mental factors by 
regression,’’ Psychometrika, Vol. 4 (1939), pp. 109-116. 

(9] J. Morris anv J. W. Heap, “‘Lagrangian frequency equations, an ‘escalator’ method 
for numerical solution,’’ Aircraft Eng. Vol. 14 (1942), pp. 312-314, 316. 

[10] F. V. Waueu, “A note concerning Hotelling’s method of inverting a partitioned 
matrix,’’ Annals of Math. Stat., Vol. 16 (1945), pp. 216-217. 











THE FREQUENCY DISTRIBUTION OF DEVIATES FROM MEANS AND 
REGRESSION LINES IN SAMPLES FROM A MULTIVARIATE 
NORMAL POPULATION 


By D. J. FINNEY 


Oxford University, England 


1. Summary. The joint frequency distribution has been found for any set 
of the (n — k) deviates from their sample mean of each of the ¢ variates in a sam- 
ple from a multivariate normal population. Expressions for the variance of any 
single deviate in this distribution, the correlation coefficient between any pair 
of deviates, and certain partial correlation coefficients between any pair have also 
been obtained. 

These results have been generalized so as to include the corresponding proper- 
ties of deviates from a set of ¢ multiple linear regression equations estimated 
from the sample, the m independent variates being the same for each of the t 
dependent. 


2. Introduction. Some vears ago, Irwin published results relating to the fre- 
quency distribution of the deviations of individual observations from the mean 
of a sample drawn from a normal population (see [1]).. He derived an expression 
for the joint distribution of any number of these deviates, which distribution 
is ulways of the normal multivariate form, and thence obtained the total and 
partial correlation coefficients between any pair of the deviates. 

The purpose of this paper is to discuss the generalization of Irwin’s problem, 
firstly to the properties of the deviates of individual observations from the mean 
in a sample from a multivariate normal population and secondly to the properties 
of deviates from a regression equation instead of froma mean. So far as is known 
to the writer Irwin’s results are of little practical importance, and these generali- 
zations are probably of no practical value whatsoever. Nevertheless, they have 
some interest as additions to the knowledge of the mathematical properties of the 
normal frequency function, and for that reason alone they are put on record here. 


3. Deviations from the sample mean. Irwin based his discussion on a normal 
population with mean m and variance o, but the algebra is simplified a little, 
without any real loss of generality in the final results, if, by means of a prelimi- 
nary transformation, these parameters of position and scale are made zero and 
unity respectively. The multivariate normal distribution in the ¢ variates i, 


(¢ = 1,2, --- ft), each with mean zero and variance unity, has the frequency 
function 

1 er 
(1) (Qr)" R? exp 1 E P iV iY, 


344 


een ee 


DISTRIBUTION OF DEVIATES 345 


where i,j = 1,2, --- t; p’ is the cofactor of the element p;; in the determinant 
of population correlation coefficients 


| 1 pr pis -** pie | 
(2) R = | pio 1 pos - ++ poe 


eee eeereeeeense 





Pit Por Px *** 1 


A summation convention for the affixes 7, 7 is understood throughout this paper, 
except when the contrary is explicitly stated. 

Let (w,) represent a sample of n independent sets of values of the ¢ variates 
randomly selected from the population, (p = 1,2, ---,n). Then the element 
of probability for the sample is 


1 Li“ 
(3) (2n)'™* Rim EXP {3 p. 2X Ye ae} JI (d(iyp)}- 
per ite 


If 7 is the mean of the n sample values of iy, the deviates from the mean are 
(:¥,), where 


“ 1 
(4) Y,=Yr- = Ze (sr _ *) Yeo, 
q y 
the summation being taken over g = 1,2, --- n with 
1 if p=q 
Spq = sis 
\O if pq. 


Now the ,Y are linear combinations of normally distributed variates, and are 
therefore themselves normally distributed. Clearly 


(5) E(iYp) = 0 
and, from an expansion by means of equation (4) using 


E(Yp Yo) = Spq pis, 





(6) a la 1 
E(;} p iX q) = (dp, — — } pis, 
n 
wherep;, = 1 (not summed). Consequently the variance of any one deviate is 
2 n-—l1 
(7 o(GY,) = "—, 


and the correlation coefficient between any pair is 


Ning — 1 
Pe - ins 





(8) PCY, i¥q) = 


Equation (7) and equation (8) for the particular case of i = j agree with the 
well-known results that Irwin has already given as equations (10) of his paper. 





346 D. J.. FINNEY 


For any 7, only (n — 1) of the deviates ;Y, are functionally independent. The 
joint distribution of these for p = 1,2, --- , (n — k) may be obtained from an 
inversion of the matrix of correlation coefficients. If A is the determinant of 
this matrix and A(;Y,, ;Y,) the cofactor corresponding to the two elements 
specified, this inversion shows that 


A(iYp, iVo) _ ( ') pis nn — 
(9) ——e dpqg + oo? 


The joint distribution is therefore 


(10) const. X exp {38 a * (5, + 1) iY, iv.) ITY). 
2R psqsn—k k, 


Now A may be evaluated as 


n t(n—k—1) k ) _ 
a=(—*,) (4, ai. 


and the constant multiplier in equation (10) is therefore 


Am (7) 


[Ox RP 


From equation (9), the partial correlation coefficient between any two of the 
variates in the distribution (10), the remaining t(n — k) — 2 being held constant, 


is written down as 


. ° , . Kbpg + 1 pil 
(12) partial correlation coefficient between ;Y,, ;,¥, = — bei ‘Gi Bah 
the summation convention is suspended for this equation. 


4. Deviations from regression equations. The results obtained in section 
three may be generalized so as to relate to the frequency distribution of deviates 
from linear or polynomial regression equations instead of to deviates from means. 
Suppose that there are m independent variates x“, (a = 1, 2, --- , m), which 
take values x} corresponding to the sample observations y, ; polynomial re- 
gressions may be included by taking powers of an x as separate variates. Ifa 
conventional variate x°, whose value is always unity, be introduced, the regres- 
sion equation of yon x*, (a = 0, 1, 2 --- , m), may be written 


(13) 6 ib*z", 


where a summation convention is understood for a = 0, 1, ---, m and the 
regression coefficients are the solutions of the normal equations. 


(14) b* Do 52 = Dy Yr2- 





DISBRIBUTION OF DEVIATES 


Write 
(15) BY =) aah 
Pp 


and let (Bas) be the inverse matrix of (B“). 
Then the solutions of equations (14) are 


(16) ib = Bop Do p>. 
P 
If the deviation of iy, from the regression equation (13) is Zp, then 
Zp = Yp — imp 


(17) a x (5ng — Bap x25) Yo, 


the summation for q being over g = 1,2, ---,n. As for equation (5), 
(18) '  E(GZ,) = 0. 
Also 
(19) E(iLy hq) = (8py — Bags) pii- 
since by definition 
BY Bay = Spy. 


Write now 6f or the square matrix of (m + 1) rows and columns whose elements 
are the B°*, and X, for the single column matrix of values z corresponding to 
the pth observation; i.e. 


(20) 6 = (B™) 


and 


Write also 
(22) Onan. = O0— XX, — XX, -— XX, — -:-- 
Then 
|| = |0|-(1 — Bagx5a',), 
and 


| 954 — | Op + x,X0 ac | 6 |- Bapt 5x9. 








348 D. J. FINNEY 


Hence, from equation (19), the variance of a deviate may be written 
a | Op | 
|o| 


and the correlation coefficient between any pair of deviates is 


(23) (Zp 


(ps; (p = q) 
(24) PCL, Za) = ) | 05¢| — | Org + Xv Xo | 
Teel Teal mae 


For any 7, only (n — m — 1) of the deviates ;Z, are functionally independent. 
The joint distribution of these for p = 1,2, ---,(~ — k) andanyk > m+1 
may be found by inversion of the matrix of correlation coefficients obtained from 
equation (24). The multiplier of the exponential in this distribution of i(n — k) 
variables is ; 


| 6 |d#(n—k) 
(Qr) to Rie D*? 











where 
D= 
y yy! ' . eG 
| A: | io} — |Oo + XiXe |Arn—k| — [Orne + XiX4 
: — ' ' ' ’ ' 
\@r1o| — Ar + XX | Ae | -2* |Oo.n—z| — [Oon—k + X2X nz 
101 .n—x| |0: n—e-+ X 1X n—k| 02 n—K| = |O2,n—e-+ XX m=—k} °°” { On—k | | 
Since @ is positive definite, there exists a non-singular matrix K such that 
Kok’ = TI. 
Then the Y, may be transformed to new column matrices W, by 
aa 
, (w', 
1 
|w 
w’,| 
KX, = W,= |: | 
=| 
| | 
m 
Lw's| 
and consequently 
r x—lyqr 
X,= K W,. 
It follows that 
| | r x! 
O, | = | Ot +} I = VW gt ols 


which may be reduced to the form 


0p, = | 0\|-(1 — wu). 








ofl 
sail 
n—k| 


DISTRIBUTION OF DEVIATES 349 


Similarly 
| ®p¢| — | Ong + XpXq| = —| O| whwy. 
Hence 
1 — wrwi —wiw,; -:: — WT Wk 
D=|0|"" | —wrw? 1 — wywz -:- — Wz Wak 
— Wi Wx —W2 Wak * °° 1 — wr-Kwr- 


| we 

‘ | 

D = jon | Int | 
| W.-| 

RM «~~ Wie hal 


= |0|"*-|I — WiWi — WeW2 — --- —Wa-rW | 
= | 6 a, | 91,2,.--,(n—k) ‘ 


Thus, finally, the constant in the distribution is found to be 


- (2m) REO Yl ouif 
in which Q; has been written for 6;.2.....~n-~ , a matrix of the same form as 6 
but calculated from the last k sets of observations only. 

The cofactors of the matrix of correlation coefficients, required for the coeffi- 
cients of the quadratic form in the distribution, can be derived in a similar man- 
ner. The distribution may be written 


1 jo] \* 
{(2n)' RY" || 9, | 
“exp. {3 6" > (én - 1 +t XeKel) 7, 24} 1 (an, 
2R psqsn—k 


of which the distribution (10) is easily seen to be the particular case for m = 0. 
From (26), the partial correlation coefficient between any pair of deviates, 
iZ, and ;Z,, may be written down as 
1% + XpXs] + Bre — VW 1%) pt __ 
ro "4 | , 7! 7 Bl > 
{| Q + X,X> | > |& + X_Xq iy (p p””)? 


in this expression the summation convention is again suspended. 


(27) 








REFERENCE 


(1) J. O. Irwin, ‘On the frequency distribution of any number of deviates from the 
mean of a sample from a normal population and the partial correlations be- 
tween them,” Roy. Stat. Soc. Jour., Vol. 92 (1929), pp. 580-584. 





ON THE ASYMPTOTIC DISTRIBUTIONS OF CERTAIN STATISTICS 
USED IN TESTING THE INDEPENDENCE BETWEEN SUCCESSIVE 
OBSERVATIONS FROM A NORMAL POPULATION 


By P. L. Hsu 


Columbia University 


1. The statistics to be considered here have the general expression 


= _ Q= Dy a:i(a — £)(x; — 2), S= ) (x; — 2)’, 


t=1 


where (x; , --- , Xv) is a sample from a normal population whose mean and vari- 
ance can evidently be assumed to be 0 and 1 respectively.’ The purpose of this 
note is to study the asymptotic distribution of 7 assuming that the 2; are inde- 
pendent. ‘The whole work may be regarded as a straightforward application of 
Cramér’s theory of asymptotic expansion (see [1], pp. 69-88). 

If A = [a;,] and ¥ is the row vector N~“{1, 1, --- , 1, 1] the quadratic form Q 
has the matrix (I — y’y)A(U — y’y). The latent roots of this matrix, which are 
also the latent roots of A(I — yy)” = A(I — y’y), will be denoted by 0, 1, -:: 


’ 


An, Withn = N —1. Then Qand S can be simultaneously diagonalized (by a 
rotation of the N-dimensional space), so that 


N n 
Q= 2D m4, S= Dy, 


where the y, are again independently and normally distributed with zero mean 
andeunit variance. 

We shall make the following assumptions 

(a) |A-| < 1 for all r. 

(b) There is a positive number c independent of n such that 


> (4 — 3)? > en, where X} = - >>». 


1 
r=1 T r= 


VV 2>5 (re ~ X)*2 
r=1 

Vn? — Qn? : 

X, = (A, — 4 — 2)(y? — 1), G(x) = Pri{T < d+ 2}. 


Write 
Sm(x) = Dy (Ae — X — 2)", 
r=] 


1The exact and the approximate distribution of such statistics were a recent subject of 
study by a number of statisticians. See W. J. Dixon, ‘‘Further contributions to the prob- 
lem of serial correlation,’? Annals of Math. Stat., Vol. 15 (1944), pp. 119-144. Further 
references are listed in Dixon’s paper. 


350 





ect of 
prob- 
irther 


ASYMPTOTIC DISTRIBUTIONS 


Then it can easily be verified that 
2, X- 
i=l 
G(z) = Pr |= < 2}. 
1(x) r Ps0(2) @ = x 
This expression of G(x) shows that the application of Cramér’s expansion is at 
hand, since E(X,) = 0 and 2s,(x) is the variance of 2X,. Let pin and Tin 
stand for the same quantities as defined in Cramér’s work (see [1], pp. 70-71). 
Since moments of all order of X, exist, we may use 2k + 2in place of k. We have 


» mg ex49(2) 
= Mk S2k4+2\0 ofa 


P2k+2,n = 2 k+l» T x42. = 4ps tht?’ 
nag So(x) 2k+2,n 
n 


where m = E(y* — 1)*** and y is anormal variate with mean 0 and variance 1. 

By virtue of assumption (a) | 77| < 1. Therefore we may confine ourselves 
to the range of values for which |} + z| <1. Then|A,-—A—z| <2. Also, 
by assumption (b), s:(7) > D(A, — i)? > cn. Hence Pox+2,.n, and in conse- 
quence +/ nT x42,» , are less than some constant independent of n and x. The 
remainder of Cramer s expansion, if it is justifiable, will therefore be less than 
Mn‘, where M is independent of nandz. The justification consists in verifying 
that the following condition is satisfied: if f,(t) is the characteristic function of 
X, and A is any positive number, then 


- T 2k+2,n 


‘Lub. II |\f@| for |t|> /283(z) 


r=] 


is less than M,Tx42,n , Where M, is independent of n and zx (see [1], p. 85). Since 
Tx42.n < 3°/n’ and s(x) > c/n, it is sufficient to show that, if a and A are 
any positive numbers and if 


U = lub. I] \f-(t)| for |t| >a, 
r=1 


then U < Mon™ , where Mz is independent of n andz. Now 


lf(t)| = {1+ 4fQ, —X — 27} 
whence 


U = JJ {1 4+ 40°, — X — 2}. 
r=) 


Let u be the number of A, for which (A, — } — z)? < 3c. Then cn < s(x) < 
3c(n — wp) + 4y; hence en < (8 — c)wand 


U < (1+ 2a’) < (1 + 2a’) orto 


This shows that the desired condition on U is satisfied, and that therefore 
Cramér’s procedure can be adopted. 


*This follows from the fact that Ps42.n > 1. Cf. Cramér, [1], p. 70. 











352 P. L HSU 


Wherever Cramér’s asymptotic expansion is valid, the terms in the expansion 
are most conveniently obtained with the help of Cornish and Fisher’s symbolic 
expression (see [2]): 

g UB valatasty+ WAN ya dtldz)—--- @ 9) 


where 


1 z 
te) = Fal oe 


and y; is the jth semi-invariant of the random variable whose distribution is 
under asymptotic expansion. In the present case we have 


yi — B(x) 
j! = nig ’ 


where 


1 
gii-2 7, Si) 


ii 3 2" 
J ( o(z)) 


n 
Hence we may express our result as follows: 
2k+1 3 j 
—1)’B;(xz) (d 
a) Gte) = exp| 2 AVE (2)"] ove + Rue, 
where | Ri(x) | < Mn™*, and M is independent of n and x, The symbolic ex- 


ponential in (1) is to be expanded as far as and including the term in Oe caatiiadl 


2. Let us apply the result (1) to the following three statistics: T. = Q./S, 
(a = 1, 2, 3), where 


N 
Q = dX (x; — #)(tiu. — 2) with ayy. = x1, 


Qe = (a — 2) + Haw — 2) + > (2: — 2) (tus — 2), 
Q => (i — A(t — 2). 


t=] 
T2 is simply related with 7* = Q*/S, where 


N—1 


Q* = 2 (1% — 41) 
for we have Q. = S — 3Q*, whence T, = 1 — 37.* We shall write A‘? for the 
\’s corresponding to Q., and 
bma = Zz os", iC = i, 2, 3). 


r=] 








1c 


he 





ASYMPTOTIC DISTRIBUTIONS 


(i) For Q, we have 4 = cos zs (see [3]). Since 


1 - * l = Mm : : 
™p = — tA 540m =— . s(ej—m)6 
cos 9 (e* +e") 9 2 (") e : 


we have 


bmi = = (") > #, where ¢ = ¢rcimen 
2 1 


If m < n, then 


> =-1 if j ¥ 4m, =n if j = 3m. 


r=1 
i N{(m ae 
Hence, for m < 1, bm = —1if mis odd, bm = —|, — 1 if mis even. 
2" \3m 
In particular 
n 2 
AM = —-, D AM — 1%)? = tae >04n if n>Z. 

n ral 2n 
Hence assumptions (a) and (b) are true (form > 7). The s;(z) are conveni- 
ently computed with the help of b,.1. The 8;(x) are then computed to yield 
the terms in (1). 


(ii) The \’s corresponding to Q* are 4 sin’ pe (see [4]). Hence 
\? = cos. 
N 
a —" ‘ ‘ N({m 
By a computation similar to that in (i) we easily obtain b,2. = ao teas ~ 1 for 
2 
even m and bm: = 0 for odd m, provided m < 2n. «In particular, \” = 0, 
n—1 


Tae” — {*)* = >-4n for n > 5. Hence assumptions (a) and (b) are 


2 
true (for n > 5). 
(iii) In the case of Q3 the matrix A is 


0 3 0| 
2 0 3 
1 
2 
A= 
0 3 
|| 0 % OQ] 


whose latent roots are cos mt/(N + 1), (t = 1, --- , N) (see [5)), all less than or 
equal to unity in absolute value. It follows that the same is true for the \~”. 


354 -: P. L. HSU 


Hence assumption (a) is true. Unlike the two previous cases, there is no sim- 
ple expression for b,,3. With the help of the formula 


bms = tr {A(T — y’y)}” 
we may compute b,,3 for small values of m. Thus 


ee 
n+ 1 
2n — 1 i n 
n+1 (n + 1)? 
_3(n — 1) — 1) 3n(2n ~i) nt 
n+l 2(n + 1)? (n + 1)8 
_38n—2 8n—11 | 4n(n—1) , (Qn — 1) _ 2n*(2n — 1) r 
ba = 3 Ba Ft Fle Qn+ i! m@+iy + @ +H 
_ —5(4n — 7) , 5n(8n — 11) , 5(2n — 1)(m — 1) _ n(n — 1) 
~  A(n +1) 8(n + 1)? 2(n + 1)? (n + 1)8 
_ 5n(2n — 1)° 5n'(2n i n° 
4(n+1)® © 2n+1® (n+Ih 


bs = — 


ayn - i) y=5- oT + ap > On forn > 10. 


Hence assumption (b) is true (for n > 10). Using these values of bn3 we may 
compute 63(x), Bs(x) and @;(x). By (1) we have 


im - 


G(x) = &(2) — 4 Bala) @) + 5 Gu(x)¥ (x) + 48%(0)® (x) 


* 4 (Bs(x) B(x) — Ba(x)Ba(x)O(z) + $63(z)b(z)) + R(@), 
where | R(x) | < Mn™ and M is independent of n and z. 


REFERENCES 

[1] H. Cramér, Random Variables and Probability Distributions, Cambridge Tract No. 36, 
1937. 

[2] E. A. CorntsH AND R. A. Fisuer, Revue de l’Institut International de Statistique, 1937. 

[3] R. L. ANpERson, ‘Distribution of the serial correlation coefficient’, Annals of Math. 
Stat., Vol. 13 (1942), pp. 1-13. 

[4] T. Koormans, ‘‘Serial correlation and quadratic forms in normal variables’’, Annals of 
Math. Stat. Vol. 13 (1942), pp. 14-33. 

[5] J. von Neumann, “‘A further remark concerning the distribution of the ratio of the 
mean square successive difference to the variance’’, Annals of Math. Stat., Vol. 
13 (1942), pp. 86-88. 





NOTES 


This section 1s devoted to brief research and expository articles, notes on 
methodology and other short items. 


(rn ec aa a me 


ESTIMATING THE PARAMETERS OF A RECTANGULAR 
DISTRIBUTION 


By A. GEORGE CARLTON 


Columbia University 


1. Introduction. In this note, the range and midrange of the sample are 
shown to be a pair of sufficient statistics, and maximum likelihood estimates, 
for the true range and true mean of a rectangular distribution ; exact and limiting 
distribution of midrange, range, and their ratio are derived; the “efficiencies” 
of the sample mean and median as estimates of the true mean are calculated; 
and the limiting distribution of the difference between two sample midranges is 
derived. All the limiting distributions are non-normal, and the error of estimate 
is of order n* rather than the customary order n'*. The limiting distribution 
of midrange, and the limiting ratio of variances of the midrange and sample 
mean were given by Fisher [1]. 

f(x) and F(x) are used throughout to designate the probability density fune- 
tion of a and the distribution function (cumulative probability function) of ; 
the argument will also indicate the random variable being considered. 


2. Exact distribution of midrange, range, and their ratio. Let 7,,---, x, 
be a set of n independent observations on a random variable having the rectangu- 
lar distribution f(z) = 1/L, (@ — L/2 <a < 6+ L/2), where @is the true mean, 
and L the true range. The minimum observation u and the maximum observa- 
tion v are a pair of sufficient statistics for 6 and L, as the conditional distribution 
of the remaining observations for given u and »v is independent of 6 and L: 


f(a oo La | U, v) = (v — eo 


The midrange @ = 3(u + v) and the range L = v — u are maximum likelihood 
estimates of 6 and L, respectively, as they are the parameter values which 
uniquely maximize f(71, --- , Xn) for the given set of observations. We shall 
assume that the random variable is normalized by change of origin and change 
of scale so that 6 = Oand L = 1. The joint probability density function of u 
and v is 


d’ F(u, v) (v pa u)” 
f(u, v) = CFO») _ : 


~ dvd(—u) dv d(—u) 


(1) ; 
= n(n — 1)(v — u)”"”, 


355 











356 A. GEORGE CARLTON 


Making the transformation 6 = 4(u + v), L = v — win (1), 


(2) f(6, L) = n(n — 1)L"” (0<2|/6|<1-LZ <1), 

Integrating out L from 0 to (1 — 2{|6)), 

(3) f(6) = nl —- 26)" (| 6| <4) 
| F(@) — FO) | = 3 — 201 — 2] 4))", (|| < 4). 


Odd moments vanish by symmetry; even order moments are 


(4) un(0) = [. no" (1 — 2|6|)" "db = 2° ss + *). 


In (2), integrating out 6 from 3(L — 1) to 3(1 — D), 
f(L) = n(n — 1)L"*(1 — BD), O<L <1). 


F(L) = n(n — 1) [ L”*(1 — L) dL = n(n — 1)Bi(n — 1, 2), 
0 


(5) 
O<L< 1). 


n(n — 1) 
n+khnm+k— 1)" 


1 
ux(L) = n(n — 1) I L*?’“a — L)dL = 
Thus w(Z) = (n — 1)/(n + 1); hence the bias of Z can be removed by multi- 
plying L by (n + 1)/(n — 1). 
The statistic 1 = 6/L can be used to test the hypothesis that the mean of a 
rectangular distribution of unknown range is 0. To obtain the distribution of i 
when the hypothesis is true, set t = 6/L and L = Lin (2): 


f(t, L) = n(n — 1)L""" ~L<ei+3 }¢)~. 
(6) f(t) = (n — DOU + 2| #)) 
| F(t) — F(O)| = } — 40 + 24 i |)” 


Moments of i do not exist for order greater than (n — 2); for k < n — 2, odd 
moments vanish by symmetry and 


por(t) = 2(n — of (1 + 2t) "dt = ese; a ». 


3. Limiting distributions. 6, L, and t have non-normal limiting distributions, 
although 6 and Z are maximum likelihood estimates; this is explained by the 
discontinuity of f(x, 6) at x = 6+ 4. We obtain the limiting distributions of 
q = nOandr = n(1 — ZL). Substituting q and r in (2), and proceeding to the 
limit. for increasing n, 





n—2 
iim f(a, r) = tim "= (1 — 2) =e, O<2|qi|<r< ~). 
























RECTANGULAR DISTRIBUTION 


The necessary simple integrations yield the following limiting distributions: 
f@ =e". 
| F@) — FO) | = 4 — 4e7™". 
Hex(q) = (2k) 1/2"; Mex41 = 0. 
f(r) = re", (r > 0) 
Fir) = 1-(1+ ne", (r > 0) 
u(r) = (k + 1)! 


The limiting distribution of s = nt is the same as that of n@, as is seen by com- 
paring (3) and (6). 


(7) 


4. Comparison of @ with < and z as estimates of 6. The sample mean # and 


median ¥ are unbiased estimates of @. 


i 
po(Z) : [ a dx = 1/(12n). 
| nJy 
| (8) 


: 3 
we) = f F9a)de = | FOO GO — are + par, 


m!m! 


for n = 2m + 1, man integer. Substituting z = 1 — 42°, then simplifying the 
Beta function obtained on integration, 





(2m + 1)! [ ‘ ; 1 1 
_ | i « eee 
, 0) m(@) = imiges J, 2 1 — 2 = Tos = a) 
: (4), with k = 1, gives w(@) = GET DS Comparison of this with (8) 
and (9) shows that yo(@)/pe(%) = DCE and po(%)/po(%) = 3n/(n + 2). 


ciency”’ of the mean is zero, and the median is only one-third as “efficient” as the 
mean. (The concept of efficiency is not strictly applicable as 6 does not have a 
normal limiting distribution.) 
5. Limiting distribution of difference between two midranges. Let 6, and 
d 6. be the midranges of samples of mn; and nz observations, respectively, from two 
normalized rectangular populations, and let z = gq: — q@ = m6; — mob.. Apply- 
ing the formula for composition of random variables, one obtains from (7), 


ic — OF@ dq = | g tleal lal ag 


ep 


) | As n increases, u2(6)/p2(t) — 6/n — 0; and po(%)/uo(%) > 3. Thus the “effi- 


| f(z) 


S, < : —2\z| —4¢ ” —2/ 2] [ 2\z| —4¢ 
. | (10) = [ e~-e- “dq + l e- dq + " ee “dq 
: - gentlel 4 2 | ele! i Lele! = (j2| + Le?! 


F(z) — F(0)| = 3 — itt etl 


uox(2) = (k + 1)(2k)!/2". 


A. GEORGE CARLTON 


_— mnt mh _ Ml(vz2 + Ue) 
~ 2n— uu 2(ve — ue) 
means of any two rectangular populations, and has in the limit the distribution 
(10), if the means of the populations are equal. 


san be used to test the hypothesis of equality of 


a 
~ 


6. The one-parameter rectangular distribution. If f(x) = 1/A, (0 < x <)), 
then f(zi1,---,2,|v) =v”. Thus v is a sufficient statistic and is evidently 
the maximum likelihood estimate of \. Here F(v) = (v/A)"; f(v) = nv™ "Dv; 
and y,(v) = *n/(n + k). The normalized error y = n(\ — v)/d has the prob- 
ability density function f(y) = (1 — y/ n)"*, which tends to e ” as n increases. 


REFERENCE 


[1] R. A. Fisuer, ““On the mathematical foundations of theoretical statisties,’’ Phil. 
Trans. Roy. Soc. London, Series A, Vol. 222 (1921), pp. 309-368. 


ON THE POWER FUNCTION OF THE SIGN TEST FOR 
SLIPPAGE OF MEANS 


By Joun E. WaAtsH 

Princeton University 
1. Summary. This note compares the power functions of the sign test for 
slippage with the power functions of the most powerful test for the case of nor- 


mal populations. The sign test is found to be approximately 95% efficient for 
small samples. 


2. Introduction. Let us consider a univariate population whose mean equals 
its median and whose cumulative distribution function is continuous at the 
mean. A sampling method of testing the supposition that the mean of this 
population exceeds a given constant value po (slippage to the right) is furnished 
by considering how many values of the sample are less than uw). An analogous 
method applies for testing whether the mean is less than po (slippage to the left). 
A particular class of populations for which the sign test is valid are the normal 
populations. This note compares the power functions of the sign test with the 
power functions of the most powerful test for slippage for the case in which the 
population is normal (Table I).. It is shown that the sign test is approximately 
95% as efficient as the most powerful test (the Student t-test) for samples of size 
4,5 and 6, and that although the relative efficiency of the sign test decreases as 
the sample size increases, its efficiency is approximately 75% for samples of size 
13. This supports the idea that for normal populations little efficiency is lost 
by using attributes instead of continuous variables if the sample size is small. 

In choosing between the sign and Student ¢-tests for slippage the following 
considerations may be of interest: 





POWER FUNCTION OF SIGN TEST 359 


(a) The sign test is valid for a more general class of populations than the ¢-test. 

(b) The sign test is almost as efficient as the t-test for small samples from nor- 
mal populations. 

(c) The sign test is much more easily computed than the ¢-test. 

(d) The sign test has a very limited choice of significance levels for small 
samples while the t-test can have any desired significance level for any size 
sample. 

The considerations (a) to (d) also apply in choosing between the sign test and 
the Daly test based on (% — po)/R, where Z is the mean and R the range of the 
sample used for the test (see [1]). 

In section 5, Table II shows that for small size samples the significance levels 


of the sign test do not change greatly if the mean is only approximately equal 
to the median. 


3. Statement of sign test. Let x2, --- , x, be a sample of size n from a uni- 
variate population whose mean equals its median and whose cumulative distribu- 
tion function is continuous at the mean, that is, which has the property that 


(1) Pr(z < uw) = Prix > wu) = 3, 


where uw is the population mean. 
The significance test to decide whether u exceeds a given constant value jo 
is defined by 


(2) Jf m or less of the sample values 2%, +++ , Xn are less than po, accept up > wo. 
The significance test to decide whether » < yo is given by 
(3) Jf mor less of 21, -+- , x, are greater than wo , accept wu < wo. 


It is to be observed that in both (2) and (3) the null hypothesis tested is that 
w= po. In (2) the alternative is u > wo and in (3) the alternative is u < po. 

From (1) it follows immediately that (2) and (3) both have the same signif- 
icance level a(m, n), where 


ac n! 
a(m, n) = (4 =, 
( ’ ) (2) 2 — j)! 
Appropriate choices of m and n will result in values of a (m, n) suitable for sig- 
nificance tests. For example 


a(O, 4) = .0624, a(1,8) = .0352 
a(0, 5) = .0312, a(1, 9) 0195 
a(0, 6) = .0156, a(1, 10) = .0107 
a(1, 7) = .0625, a(2, 13) = .0112. 


If the population has a continuous distribution function, Pr(a; = 2; ;7 ¥ j) 


= 0. In this case let 2;:) be the ith largest of 21, ---,2,. Then (2) can be 
restated as 


(4) If tim41) > bo, accept p > mo. 











360 JOHN E. WALSH 


Test (3) is seen to be equivalent to 
(5) If Xin—m) < fo, accept p < mo. 


Thus for the case of populations with continuous distribution functions it is 
only necessary to determine one order statistic and compare it with yo in order 
to apply a test. 

It is to be observed that a particular class of populations which satisfy (1) are 
those which have distribution functions which are symmetrical and continuous, 
Thus the normal populations represent a particular class for which (4) and (5) 
are valid. 


4. Comparison with Student /-test. Consider the case in which the popula- 
tion is normal with mean uw and variance o. Then the power function for (4) 
is given by 


Power Function = Pr(2¢msi) > po) 


~ p, (tan = #5 m=) 


o o 


= a ecint (Lym a) (frm av) se ae 


7 


where 


<a and § = —_., 


1 
V 2 go 

For a normal population, however, it is well known that the most powerful 
Studentized test of the one-sided alternative u > yo is the appropriate Student 
t-test. Values of the power function for the t-test are found for given values of 
6 by using the normal approximation given in [2]. 

The method of measuring the relative efficiencies of the two types of tests will 
be different from the common method of measuring the relative efficiencies of 
estimates, which consists in taking the ratio of the variances of the two esti- 
mates as the measure of their relative efficiency. The principle followed here 
will be to consider a sign test based on a given sample size and vary the degrees 
of freedom of the é-test having the same significance level until the power fune- 
tions of the sign test and t-test agree in the sense that in the half-plane 6 S 0 
the area between the two power curves for which the sign test power function 
exceeds the t-test power function is equal to the analogous area for which the 
sign test power function is less than the /-test power function. The considera- 
tions are limited to the half-plane 6 < 0 because the test is one-sided. The size 
of the t-test sample having this property divided by the size of the sign test sam- 
ple is called the relative efficiency of that sign test. Intuitively this relative 
efficiency measures how much more data must be added if the sign test is to 


fy) = 





ire 


dx, 


‘ful 
ent 
; of 


will 
: of 
sti- 
ere 
ees 
ne- 
c 0 
ion 
the 
ra- 
size 


‘ive 
; to 


POWER FUNCTION OF SIGN TEST ~ 361 
furnish an amount of information equivalent to that supplied by the é-test. In 
obtaining the relative effi iencies in the manner described above, the degrees of 
freedom of the ¢-test are allowed to assume fractional values onl the values of 
the power func tion are computed using the normal approximation as if it were 
valid for fractional degrees of freedom. The number of degrees of freedom, of 
course, can only be integral. This method, however, gives an interpolated 


TABLE | 


A comparison of the power functions of the sign and t tests 





| | | 
| | 























Approx- oe Values of Power Function 
Test Se n | imate Significance’ 
| | Relative Level pomeenaieiats a PER ee 
| | ny | ae | §=—1 a | é6=-—2 
= - ” | - — . -_——_-— —_— ——__———_- | — — |e 
t | | 3.8) 0624 | .219 | .484 | .755 | .920 
sign | O | 4 95% | .0624 | .229 .500 | .755 | .908 
| | | 

|__| —|—_—_———_|-—__——|-—- ———|- a 

} | | | | | 
_ 4.8 | — .0312 | «150 | .402 | .709 | .909 
sign | O | 5 | 96% | .0312 | .159 | .420 | .703 | .888 
I ee a Ly eae Ol 
t | 5.7 | .0156 | 098.330 | .660 | .899 
sign | O 6 | 95% | .0156 | .110  .355 | .655 | .863 
| lit eacicadlliiincieataaliten ssid biastnartesacialnaiin caniie dsksdeciectdeleinacdassdiacamaiees 
t 5.6 | | 0625 306.695 | .932 | .995 
sign | 1 7 | 80% 0625 | .311 | .711 | .920 | .988 
= +_—_ |---|} ——-- haat ins 
t | 6.4 | 0352 | .225 | .619 | .908 | .989 
sign 8 | 80% .0352 | .239 | .630 | .869 |. .978 
t | 7.4 | 0195 | .171 | 565 5 | 893.988 
sign | 1 y 82% | .0195 | .182 , .573 | .879 | .974 
ana Nea ca a ne ge er ae ae ae ace ae eee 

| 
t | 8 0107. .117. .468 | .848 | .983 
sign 10 80% | .0107 | 137 | .515 | .853 & .964 
t 9.75, one | 1162 | .631 | 950 | .998 


| 1 ' 


sign | 2 13 75% | .O112 | .165 | .661 | .949 | .998 


measure of the size sample of the é-test having the properties outlined above. 
Table I supplies a comparison of the relative efficiencies and the powers of the 
sign test and the ¢-test obtained in the manner just described. Thus for samples 
of size 4, 5 and 6 the sign test is approximately 95% as efficient as the Student 
t-test. The relative efficiency decreases as the size of the sample increases but 
even for samples as large as 13 is approximately 75%. 





362 : JOHN E. WALSH 


For normal populations it is also well known that the most powerful Student- 
ized test of the alternative u < yo is given by the appropriate Student ¢-test, 
It is clear that Table I can also be considered as a comparison of test (5) with 
the corresponding Student t-test if 6 is replaced by —é6 and m by n — m. 


5. Approximate cases. Suppose that (1) is only approximately satisfied by 
the population in question. 
Let Pr(x < w) = 4+ -r. Then the significance level of (2) is 


m 


(6) 


Goat ve - 9 


Te 


Significance levels of (2) for small size samples are given in Table II as a funce- 
tion of r. 


TABLE II 


A comparison of the significance levels of the sign test when the mean differs from 
the median 


Significance Level 


r=— .02 r=— 05 





O91 .053 
.038 -050 .026 


.028 .012 | 


Table II shows that for small samples the significance level of (2) does not change 
greatly from a(m, n) if (1) is only approximately satisfied. Expression (6) 
shows, however, that for large size samples even a small value of r can cause a 
large change in the significance level of (2). 

For Pr(x < uw) = 4 + rit is apparent that the significance level of (3) is (6) 
with r replaced by —r so that Table II applies to tests (3) if this replacement is 
made. 


REFERENCES 
[1] J. F. Daty, ‘‘On the use of the sample range in an analogue of Student’s t-test’, 
Annals of Math. Stat., Vol. 17 (1946), pp. 71-74. 


[2] N. L. Jonnson anv B. L. We cn, ‘‘Applications of the non-central t-distribution”, 
Biometrika, Vol. 31 (1940), p. 376. 





AN APPROXIMATION TO THE PROBABILITY INTEGRAL 


By J. D. WiiutAMs 
United States Naval Ordnance Test Station, Inyokern, California 
1. Summary. It is shown that 


1 142 2/ mr) 
je | eM asu-eoy 


z 


and that the equality is never in error by as much as three-fourths of one percent. 
Other approximations are discussed. 


2. For use on those occasions when an approximate analytic expression for 
the integral 


(1) p(x) = a. | e* at 
V 2x °-2 


is desired, the approximation 
2 2 
(2) p’ (x) fi — em}! 


is simple and reasonably accurate. An approximation equivalent to this is 
quite commonly used in problems involving a bivariate normal distribution, 
but its use in the one-dimensional case seems to be less well known. 

We shall first show that p(x) < p’(x) and then estimate, by calculation, 
the relative error made when the equality is accepted. 


p(x) e* dt 


—.. [ 
Von 2 
= 2 | | e Mite dt, its 
WT J—-z J—z 
1 29 p(2zr!./z) e b 
< laf [ re” dr io| 
T ~0 0 


= {1 —¢ CP} = p'(z), q.e.d. 


The approximation, introduced at the stage of passage to polar coordinates, 
comprises replacement of the square region of integration —x <2; <2xbva 


circular region, 0 <r < =x, having the same area. Since we are dealin 
; , V/ T 


with a circular normal distribution with zero means, the region of fixed area 
which covers the greatest density is a circle whose center is at the origin. 
Therefore our square region of area 4x? must contain less density than the cir- 
cular region of area 42? by which we have replaced it. 

The maximum value of the relative error, 


4 _ P(x) _ 
*) = oa)» 
363 











364 J. D. WILLIAMS 


is found by calculation to be about seven-tenths of one percent, as may be judged 
from Table 1, column 3. 

The question may be asked: Can the relative error be reduced by suitable 
choice of the parameter c in 


(5) p(x) = fl — eh? 


Calculation indicates that by taking c = 0.6302 the relative error is reduced to 
about one-half of one percent; but this gain is offset, for many purposes, by the 
loss of the inequality (3). 

The density function implied by (2), namely 


{ 


(6) o'(x) = 12) raeety _ amet 
T 
has the variance 
(7) o? = wr (1 — log 2) = 0.964. 


If c is determined so that the density function will have unit variance, then 
(5) becomes 


-222- 
(8) p'(x) = E - (5) LE 


this approximation to (1) leads to relative errors of almost two percent, which 
oecur when z is small. 

The density function (6) may be used to judge the quality of (2) in approxi- 
mating to an integral of the form 


J we 
(9) inant ea, 


the approximation being 
(10) p’ (x1, t2) = 3 [p’ (a2) — p’ (a)) 


when zx; and x2 are positive (which is the severe case). It is evident that the 
relative error in accepting (10) for (9) cannot. exceed the greatest relative dis- 
crepancy e¢,, in the interval a < x < x2, between density function (6) and the 
normal density 


] 
_ — 
(11) p(x) = Van ® 
The quantity 
(12) é = p (2) — J 


p(x) 
is tabulated in Table 1, column 6, from which it appears that the relative error 
committed in using (10) for (9) will surely be less than one-and-a-half percent 





ch 


<1- 


he 
is- 


he 


ror 
ent 





AN APPROXIMATION 365 


provided 0 < x; < 1.8; but the relative error may be very great when the inter- 
val of integration lies beyond x = 1.8. 

The approximations described herein were suggested by the following situa- 
tion, encountered in work done by the Applied Mathematics Panel, NDRC: 
The probability P of at least one success, defined by —x < x; < z, in a sample 


TABLE 1 





























x p(x) | p(x) €p | oz) | ae) Ep 
0; Oo | 0 0 | .3989 3989 0 
1 | 0797. | = .0797 | 0002 | .3969 | .3970 .0005 
2 | 11586 | .1585 0005 | .3914 | .3910 |  .0010 
3 | .2360 | .2358 | .0008 | .3821 | .3814 | 0018 
4 | .3112 | 3108 | .0013 | 3695 | .3683.  .0083 
unica edict cnsskstdehceneaspeciniainsensidtulbishadthidies 
5 | 3836 | .3829 | .0018 | .3539 | .3521 | 0051 
6 | .4526 | .4515 | .0024 | .3356 | .3332 | .0072 
7 5177 | .5161 .0031 | .3151 | .3123 | .0089 
8 5785 | .5763 | «= .0088_ «| «= 2929 |S 2897 | S11 
9 6347 | .6319 | 0044 | .2695 | .2661 | .0128 
||} _—c|_—-——— 
1.0 6862 | .6827 0051 | .2454 | 2420 | 0141 
1.1 | .7329 0 | 7287 | = 0058 | 2211, 2179 | 0147 
1.2 | .7747 | .7699 0063 | 1971 | .1942 | .0149 
1.3 | .8118 | .8064 |  .0067 | .17388 | .1714 | ~~ .0140 
1.4 | .8443 | .8385 | .0069 15161497 | 0127 
| | 
1.5 8725 | .8664 | 0070 | .1306 | .1295 | .0085 
1.6 | .8967 | .8904 | .0070 | 1113 | .1109 | — .0036 
1.7 | .9171 | .9109 | .0068 | .0937 | .0940 | —.0032 
1.8 | .9341 | .9281 | .0065 | .0781 | .0790 | —.0114 
19 | .9485 | .9426 | .0063 | .0640 , .0656 | —.0244 
2 


0 | .9600 | .9545 | .0058 0520 | .0540 | —.0370 


of n pairs (2; , X2) from a population in which the independent component prob- 
abilities are p(x), is 


(13) P=1-— [1 — pz)”. 


A little numerical exploration, supplemented by examination of the limiting 
values as x > 0 and x — «, revealed that when P is fixed the quantity log n is 
very nearly a linear function, of slope minus two, of log x; so nearly, in fact, 
that one was encouraged to posit the linearity and observe the consequences. 
This yielded (5), which became (2) by requiring that it go to zero with zx in the 
same manner as (1). 





366 G. A. BAKER 


DISTRIBUTION OF THE RATIO OF SAMPLE RANGE TO SAMPLE 
STANDARD DEVIATION FOR NORMAL AND COMBINATIONS 
OF NORMAL DISTRIBUTIONS 


By G. A. BAKER 
College of Agriculture, University of California at Davis 

1. Introduction. The distribution of sample ranges in terms of the stand- 
ard deviation of the sampled population for homogeneous populations has been 
dealt with in some detail by mathematical methods for the normal parent and by 
empirical sampling methods for non-normal parents. These results are pre- 
sented in summary in Tables XXII, XXIII, and XXIV of [1]. Bliss [2] suggests 
that the range in different sized samples from a normal parent at various levels 
of significance, in terms of the standard deviation computed with varying degrees 
of freedom, would be a valuable table. It is not clear whether he means that 
the standard deviation is to be estimated from the same sample as the range or 
from a second independent sample, as is done by Newman [3], Pearson and 
Hartley [4], and Hartley [5]. 

In natural hybridization of distinct types of plants and subsequent back cross- 
ing with parental types distinctly bimodal populations may develop. Heiser 
[6] has described such a situation for sunflowers. Similar situations may occur 
in natural and artificial crossing of peaches and apricots as shown by the work of 
Hesse [7] of this station. In studying such genetical material it often would be 
helpful to know the expected distributions of the sample ranges in terms of the 
sample standard deviations estimated from the same sample for certain typical 
nonhomogenous populations. Applications to such data will be published 
elsewhere. 

Since the mathematical situation for the distributions of the sample range 
(R) in terms of the sample standard deviation (s) appears somewhat complex, 
empirical sampling methods were resorted to for obtaining the distributions for a 
normal parent (N), a symmetrical distinctly bimodal nonhomogeneous parent 
(A), and a weakly bimodal but strongly skewed parent (B). Populations A and 
B are pictured in charts A (p. 341) and B (p. 348) of [8]. 

Population N is approximately represented by 


1296 _ 4 (X — 155)" 


N — exp. ; 


population A by 


(X — 15.5)” 
gies Me Ney ees, iD. — 3 
L 35 + exp. 5 


, (x —) 


25 
and population B by 


972 _ ,(X — 155)’, , _ 
(B) a (exp 3 35 + 3 exp. 


(x - =) 
, 25 





DISTRIBUTION OF A RATIO 367 


The method of drawing samples is the same as that originally described in [9]. 
N, A, and B each have a total area of 1296. Thus, 1296 integers distributed 
over a proper range and with the frequencies indicated by the corresponding 
areas under the curves N, A, and B were entered on charts with 6 big rows and 
6 big columns of squares which were subdivided into 6 little rows and 6 little 
columns. In each case the 1296 integers were distributed in a non-systematic 
way among the 1296 little squares. By throwing 4 differentiated dice (one die 
assigned to a big row, one to a big column, one to a little row, and one to a little 
column) it was possible to draw random individuals from populations that are 
approximately N, A, and B. 

Fisher [10] has defined g, which measures the skewness of a distribution and g» 
which measures the flatness. These g’s are equivalent to the square root of 8; 
and 6. — 3, respectively in Karl Pearson’s older notation. For population A, 
qm = Oand g, = — 1.10. For population B, g, = 0.62 and g. = —0.29. 


TABLE 1 
Distribution of range in terms of sample standard deviation for samples of specified 
sizes from a normal parent population (N), g:. = 0, g. = O 


| Standard | | Standard | | Standard 
Mean Devia- ( | Error of 92 | Error of ge 


Samples | tion (Normal) | (Normal) 
| 


| Tv 
Sample | 
Size 


. | 1.4142 | 0.0 j ae 
1220 | 2.2238 | 0.1564 | —0.660 | 0.0700 


305 3.5112 | 0.3879 0.1396 
135 | 4.4014 | 0.6076 0.607 | 0.2085 . 


76 | 4.8272 0.6409 0.492 | 0.2756 
48 | 5.1215 | 0.6616 | —0.077 | 0.3432 | 





2. Empirical random sampling results. The sample sizes considered are 2, 4, 
16, 36, 64, 100. The distribution functions for various sample sizes are char- 
acterized by giving means, standard deviations, g;’s, and g.’s. The results are 
given in Tables 1, 2, and 3. The standard deviations of the samples were com- 
puted by dividing the sum of squares by one less than the number in the sample. 
When the size of the sample is two then the range divided by the standard devia- 
tion of the sample is always a constant, square root of 2. 

The constants for the distributions for all sample sizes except four were com- 
puted without grouping. The constants for the distributions for samples of 
four were computed from grouped data with a small class interval. 

3. Discussion. The mean values of the range divided by the standard devia- 
tion of the sample for population A run lower than for populations N and B. 
The standard deviations of the distributions for all parents increase from zero 
and continue to increase throughout the range considered for population N. 


















368 G. A. BAKER 









The standard deviations cut down much more quickly for population A than 
for population B. The values of g; and g2 show that the distributions are sig- 
nificantly non-normal for certain sample sizes but perhaps not seriously so for 
other sample sizes. 

The distributions of range divided by the sample standard deviation are quite 
different from the corresponding distributions of range in terms of the standard 
deviations of the population as can be seen by reference to the tables in [1]. 


TABLE 2 
Distribution of range in terms of sample standard deviation for samples of specified 
































sizes from a bimodal symmetrical population (A), g: = 0, gz = —1.10 
—- ee l l a iain aa ate as eee ee ——— 
Number | | Standard | Standard | Standard 
“ee of | Mean evie- hn | Error of gi ge ever da 
: Samples | | tion | | (Normal) | ; (Normal) 
| — |---| |---| —___—___|— 
2 | i 1.4142 | 0.0 | 0.0 | ce / 0.0 oe 
4 | 1040 | 2.2050 | 0.1551 | —0.468 | 0.0758 | —0.356 | 0.1515 
16 | 259 | 3.5742 | 0.5283 | 1.025 | 0.1514 | 1.182 | 0.3015 
36 | «115 | 4.0690 | 0.4604 | 0.561 | 0.2255 | —0.279 | 0.4474 
64 | 64 | 4.3194 | 0.3377 | 0.106 | 0.2993  —1.829 | 0.5905 
100 4] 4.4846 | 0.3194 | 0.426 | 0.3695 | —0.890 | 


0.7245 


TABLE 3 
Distribution of range in terms of sample standard deviation for samples of specified 











sizes from a skewed bimodal population (B), g: = 0.62, g. = —0.29 

| Number | | Standard | | Standard | | Standard 
— | of | Mean | Devia- | 1 | Error of gi | Je | Error of ° 
ze | Samples | tion | | (Normal) | | (Normals 
peste’ | |__| I es tac eae 

2] ... | 1.4142]. 0.0 cm 4 WMT cx. 
4 | 1061 | 2.2258 0.1459 | —0.470 0.0751 | —0.142 0.1500 
16 | 265 | 3.9277 | 0.5938 0.540 | 0.1496 | 0.405 | 0.2982 
36 117 | 4.4792 | 0.5476 | 0.400 | 0.2236 | 0.018 | 0.4437 


64 | 66 | 4.8485 | 0.5249 | 0.534 | 0.2950 | 1.028 | 0.5906 
100 | 42 | 5.0481 | 0.3626 | —0.092 | 0.3655 | —0.632 | 0.7166 





At the suggestion of the referee it is noted that the empirical results for the 
means in Table 1 are rather well approximated by E(R)/E(s). It is necessary 
to remember that E(s) # o for small samples. For a discussion of E(s) see 
Kenney [11] equation 28, page 135. 

It is also noted that if 


X = log (log sample size — log 2) 


Y = log (mean (“) ne v2) 








DISTRIBUTION OF A RATIO 369 


then the plots of the (X, Y) values in each case are approximately straight lines 
for the present range in sample sizes. 

The standard deviation and range when determined from the same sample 
are correlated. For the normal population this correlation decreases and prac- 
tically disappears for samples of 100 or greater. This is not true for populations 
A and B. For these populations the correlation between sample range and 
sample standard deviation decreases much more slowly and seems to be of the 
order of 0.5 for samples of 106. 


REFERENCES 

{1] Kart Pearson, (Editor), 7'ables for Statisticians and Biometricians, Part II, First edi- 
tion. Cambridge Univ. Press, 1931. 

{2} C. I. Buss. ‘Review of Statistical Tables for Biological, Agricultural and Medical Re- 
search, by R. A. Fisher and F. Yates, Second edition,” Science, Vol. 98 (1943) , pp. 
346-347. 

[3] D. Newman. ‘“‘The distribution of range in samples from a normal population, ex- 
pressed in terms of an independent estimate of standard deviation,’’ Biometrika, 
Vol. 31 (1939), pp. 20-30. 

[4] E.S. Pearson anv H. O. Harruey, ‘Tables of the probability integral of the studen- 
tized range,’’ Biometrika, Vol. 33 (1943), pp. 89-99. 

(5) H. O. Hart.ey, ‘“‘Studentization or the elimination of the standard deviation of the 
parent population from the random sample-distribution of statisties,’’ Bio- 
metrika, Vol. 33 (1944), pp. 173-180. 

(6) CuarLes Hetser, ‘‘An analysis of a hybrid swarm of Helianthus annuus and H. petio- 
laris near Flagstaff, Arizona,’’ Unpublished data presented at the Genetics Sem- 
inar at Davis, Calif. January 7, 1946. 

[7] C. O. Hesse, Unpublished data. 

[8] G. A. Baxer, “‘The relation between the means and variances, means squared and 
variances in samples from the combinations of normal populations,’”’ Annals of 
Math. Stat., Vol. 2 (1931), pp. 333-354. 

(9] G. A. BaKrr, ‘‘Random sampling from nonhomogeneous populations,’’ Metron., Vol. 8 
(1930), pp. 68-89. 

(10) R. A. FisHer, Statistical Methods for Research Workers, 7th edition, Oliver and Boyd, 
London and Edinburgh, 1938. 
{11] J. F. Kenney, Mathematics of Statistics, Part 2, D. Van Nostrand Co., New York, 1939. 





NEWS AND NOTICES 


Readers are invited to submit to the Secretary of the Institute new items of interest 


Personal Items 

Dr. Theodore W. Anderson, Jr. of the Cowles Commission for Economic Re- 
search has been awarded a Guggenheim Memorial Foundation Fellowship. 

Assistant Professor Theodore A. Bancroft of Iowa State College has been 
appointed to an associate professorship at the University of Georgia. 

Dr. Z. W. Birnbaum is now an associate professor in the mathematics depart- 
ment at the University of Washington. 

Mr. Albert H. Bowker and Mr. Edward Paulson, formerly with the Statistical 
Research Group, of Columbia University, have been awarded pre-doctoral 
fellowships in mathematical statistics by the National Research Council. They 
are now studying at Columbia University. 

Mr. Oscar K. Buros, of Rutgers University, is Review Editor of the Journal of 
the American Statistical Association. He is making the review section a very 
important part of the Journal with such features as replicating reviews and biblio- 
graphies of statistical methodology. Members of the Institute who are authors of 
papers and books (both English and non-English) on statistical methodology are 
urged to send a reprint, review copy, or bibliographic information to Mr. Buros 
as soon after publication as possible. 

Professor Harold Cramér, Director of the Institute of Mathematical Statistics 
at the University of Stockholm will be a visiting professor at Princeton Uni- 
versity during the fall semester of the 1946-1947 academic year. He will give 
a course of graduate lectures on the theory of probability. 

Dr. J. H. Curtiss has been appointed assistant to the Director of the National 
Bureau of Standards, where his duties will include the administration of the math- 
ematical and statistical activities of the Bureau. Dr. Curtiss served in the U.S. 
Naval Reserve during the war, and recently received a Commendation Ribbon 
from the Secretary of the Navy for his work in statistical engineering for 
the Bureau of Ships and the Office of the Commander-in-Chief. He will con- 
tinue to be on leave of absence from Cornell University throughout the academic 
year 1946-1947. Administrative direction of the Mathematical Tables Project 
of the National Bureau of Standards has been assigned to Dr. Curtiss. Members 
of the Institute are cordially invited to visit the Project when in New York City, 
and to confer with the Project Director, Dr. Arnold Lowan, concerning their 
computational problems. The address of the Project is 150 Nassau Street, New 
York City. The Project is currently supported by funds transferred to the Bu- 
reau from the Office of Research and Inventions of the Navy Department. An 
Advisory Panel of mathematicians interested in the computation of tablesis 
being formed to define the long range program of the Project. An announce- 

370 





NEWS AND NOTICES 371 


ment as to the personnl of this panel will appear in a later issue of the Annals. 

Assistant Professor W. J. Dixon of the University of Oklahoma has been 
appointed to an associate professorship at the University of Oregon. 

Dr. Hallett H. Germond has returned from war service to his teaching duties 
in the Department of Mathematics at the University of Florida. 

Dr. Earl L. Green, has accepted a position as Associate Professorof Zoology 
at Ohio State University. 

Mr. John C. Hintermaier, formerly supervisory chemist with the Forstmann 
Woolen Company of Passaic has accepted a position as chief chemist of the Van- 
ity Fair Mills at Reading. 

Mr. William Hodgkinson, Jr., has returned from war service tohis position 
with the American Telephone and Telegraph Company at New York. 

Mr. Robert H. Hoskins, discharged from the Navy in March, is employed in 
the Actuarial Ordinary General Division of the John Hancock Mutual Life 
Insurance Company at Boston. 

A testimonial dinner was given to Professor Harold Hotelling on May 3, 1946 
at the Columbia University Men’s Faculty Club as a farewell by the Statistical 
Techniques Group, New York Chapter, American Statistical Association. 
Professor Hotelling is leaving Columbia at the end of the academic year to be- 
come Professor of Mathematical Statistics at the University of North Carolina. 
Professor Helen M. Walker, on behalf of the Group, presented gifts to Professor 
and Mrs. Hotelling. The Chairman, Professor Irving Lorge, introduced the 
distinguished visitors who came to honor Professor Hotelling. Among thespeak- 
ers were Professor P. C. Mahalanobis of Presidency College, Calcutta, India, 
Dr. Stuart Rice, Chairman of the Statistical Commission of the Economic and 
Social Council of the United Nations, and Dean Pegram of the Graduate Facul- 
ties of Columbia University. Professor Hotelling reviewed the changes insta- 
tistical theory and techniques that were developed during the 15 years of his 
professorship at Columbia University. 

Mr. Calvin J. Kirchen, who has recently accepted a position with the technical 
department of Remington Arms Company at Bridgeport, Conn., addressed the 
Rochester Society of Quality Control Engineers on Sept. 17 on ‘“‘The Applica- 
tions of Sequential Analysis to Acceptance’Inspection’’. 

Dr. Walter Leighton of the Rice Institute has been appointed to a professor- 
ship at Washington University. 

Miss Dorothy Marrow has been appointed to an assistant professorship at 
George Washington University. 

Professor D. E. Morton of the National Bureau of Econlmic Research is 
joining the faculty of Cornell University. 

Assistant Professor Cecil J. Nesbitt of the University of Michigan has been 
promoted to an associate professorship. 

Dr. A. C. Olshen has accepted a position as Actuary of the West Coast Life 
Insurance Company at San Francisco. 





372 NEWS AND NOTICES 


Mr. William B. Rice has opened an office as Consulting Business Statistician 
at 1011 South Los Angeles Street, Los Angeles. 

Mr. John Salerno, formerly a draftsman (statistical) with the War Department 
is now Mathematician with the U.S. Coast and Geodetic Survey. 

Assistant Professor Henry Scheffé of the Mathematics department of Syracuse 
University has been appointed associate professor of engineering at the University 
of California at Los Angeles. Professor Scheffé has been awarded a Guggenheim 
Memorial Foundation Fellowship. 

Mr. William B. Simpson has returned from overseas and is attending the Uni- 
versity of Chicago. 

Professor Geoge W. Tyler has returned to his position in the Mathematics 
Department at Virginia Polytechnic Institute, having spent two years at the 
University of California Division of War Research. 

Professor W. Allen Wallis, who returned to his position at Stanford University 
in April after serving for nearly four years as Director of Research with the 
Statistical Research Group of Columbia University, has accepted a position as 
Professor of Statistics and Economics in the School of Business of the University 
of Chicago effective September 1, 1946. 

Mr. Frank A. Weck who served during the war as a Captain in the Office of the 
Surgeon General is now in the Actuarial division of the Metropolitan Life In- 
surance Company. 


The University of Pennsylvania held a conference on ‘‘Measurement of Con- 
sumers Interest’? at Philadelphia on May 17-18, 1946. This conference was 
sponsored by the Departments of Philosophy, Psychology, Statistics, Marketing, 


and Foreign Commerce. Among the speakers were the following members of 
the Institute: Professor L. L. Thrustone of the University of Chicago, Professor 
Louis Guttman of Cornell University, Dr. W. Edwards Deming of the Bureau 
of the Budget, Professor C. West Churchman of the University of Pennsylvania, 
Dr. John H. Curtiss of the National Bureau of Standards, Professor Paul Peach 
of the University of North Carolina, and Professor 8. 8S. Wilks of Princeton Uni- 
versity. 

The following four doctorates, with mathematical statistics as a major subject, 
were conferred during 1945 in the United States. The name, University, month 
in which the degree was conferred, and the title of the dessertation are given in 
each case: 

T. W. Anderson, Jr., Princeton, June, “The Non-Central Wishart distribution 
and its application to Problems in Multivariate Statistics.” 

Frances Campbell, Michigan, June, ‘A Study of Truncated Bivariate Normal 
Distributions.” 

W. M. Chen, California, June, ‘Power Function of the Analysis of Variance and 
Convariance of a Normal Bivariate Population.” 

J. J. Livers, Michigan, February, “Use of Partitions in Multivariate Moment 
Sampling Theory.” 








lan 
ent 


use 
sity 
elim 


tics 


the 


sity 
the 
a as 
sity 


the 
Th- 


Son- 

was 
‘ing, 
's of 
SSOF 
reau 
nia, 
each 
Uni- 


ject, 
onth 
an in 
ition 


rmal 


2 and 


ment 












NEWS AND NOTICES 373 





Professor A. R. Crathorne of the University of Illinois, a Fellow of the In- 
stitute and one of its founders, died on March 7, 1946 at the age of 72. 


(a RR a 


Announcement of New Preliminary Actuarial Examinations 

On June 7, 1947, three new Preliminary Actuarial Examinations will be given 
to undergraduate students of mathematics and others who may be interested in 
going into the actuarial profession. These new examinations are sponsored 
jointly by the Actuarial Society of America and the American Institute of Ac- 
tuaries. 

The new series of examinations will replace Parts 1, 2, and 3 of the actuarial 
examinations which have been given heretofore, but will carry the same credit 
toward Associateship in the two actuarial organizations. These examinations 
have been prepared under the direction of a joint committee of actuaries and 
mathematicians. They will be administered by the College Entrance Examina- 
tion Board at centers throughout the United States and Canada. 

Descriptions of the three new examinations are as follows: 

1. Language Aptitude Examination. Thisisa three-hour aptitude examina- 
tion testing reading comprehension and precise knowledge of the meaning of 
words. It is similar to the well-known Scholastic Aptitude Test of the College 
Entrance Examination Board, except that it is pitched at approximately the 
college sophomore level. Verbal facility and command of the English language, 
as well as mathematical ability, are important in the actuarial profession. This 
is not the type of an examination for which specific preparation can be made; 
it is an aptitude rather than an achievement examination. 

2. General Mathematics Examination. This is a three-hour achievement 
examination on material usually covered in the first two years of mathematics 
in colleges and universities in the United States and Canada. More speci- 
fically, it is based on college algebra, trigonometry, analytical geometry, and 
differential and integral calculus. It is designed to be taken by the mathe- 
matically talented undergraduate at the end of his sophomore year, although 
it is not restricted to this group. 

3. Special Mathematics Examination. This is a three-hour achievement 
examination based on the material usually covered in undergraduate courses 
in finite differences, probability, and statistics. It is designed to be given at 
the end of the junior or senior year to college mathematics majors who have 
either taken courses or done concentrated reading in these fields, but it is not 
restricted to this group. 

The two actuarial bodies will jointly award one $200 and eight $100 prizes 
to the nine highest-ranking contestants on the basis of performance on the first 
two of the examinations described above. In determining these awards the 
General Mathematics Examination will be weighted twice as much as the 
Language Aptitude Examination. 


374 NEWS AND NOTICES 


Information regarding these new examinations, and applications for taking 
them, may be obtained from either of the following organizations: 


The Actuarial Society of America 
393 Seventh Avenue 
New York 1, New York 


The American Institute of Actuaries 
720 North Michigan Avenue 
Chicago, Illinois 


cn 
Announcement of Cowles Fellowships for Women 


Two Sarah Frances Hutchinson Cowles Fellowships for women will be awarded 
by the University of Chicago for the academic year 1947-48 upon nomination by 
the Cowles Commission for Research in Economics. Applicants must be stu- 
dents of outstanding promise, preparing for the degree of master or doctor in the 
field of social sciences and statistics, preferably in quantitative economics or 
mathematical statistics. The Fellowships amount to $1000 each, but may be 
supplemented by an additional grant of $500 if the work of the Fellowship holder 
lies within the Cowles Commission’s field of interest. Holders will be expected 
to be in residence at the University of Chicago. Application and supporting 
documents must be filed before March 1, 1947. Application blanks and further 
particulars may be secured from the Cowles Commission for Research in Eeo- 
nomics, The University of Chicago, Chicago 37, Illinois, U.S. A. 


oe 


New Members 


The following persons have been elected to membership in the Institute: 


Alger, Philip L., M.S. (Union) Staff Ass’t. to Mgr. of Eng., Gen. Elec. Co. Schenectady, 
N. Y., 1758 Wendell Ave., Schenectady 8, N. Y. 

Baer, Prof. Reinhold Ph.D. (Gottingen) Dept. of Math. U. of Ill., Urbana, III. 

Behrends, Stanley George, Li.B. (la Salle) Ass’t. Purchasing Agent, 439-65ih St., Oak- 
land, 9, Calif. 

Benford, Frank, B.E.E. (Michigan) Physicist, 1643 Rugby Rd., Schenectady 8, N.Y. 

Burke, H. D., Chief of Inspection and Qual. Control, The Coleman Co. Inc., Wichita 1, 
Kansas. ° 

Church, Assoc. Prof. Randolph, Ph.D. (Yale) Postgrad. School, U. S. Naval Academy, 
Annapolis, Md., 316 N. Glen Ave., Annapolis, Maryland. 

Deihl, Douglas George, M.A. (Drake) Statistician, Tuberculosis Control Div., U. 5S. 
Public Health Service, 3896 Porter St., N. W. Wash. 16, D.C. 

Dimsdale, Bernard, Ph.D. (Minnesota) Instr. Purdue U., 464 Washington Ave., Glencoe, 
Ht. 

Eaves, James C., M.A. (Kentucky) Instr. Math. Dept. of U. of N. C., Chapel Hill, N. C. 

Elveback, Lillian R., B.A. (Minnesota) Instr. Biostatistics Dept., School of Public Health, 
Columbia Univ., 600 W. 168th St., N. Y. 32, N. Y. 

Harris, Theodore E., B.A. (Texas) Student, Graduate College, Princeton, N. J. 





ncoe, 


c..C. 
alth, 


NEWS AND NOTICES 375 


Hirsch, Warren M., B.B.A. (New York) Teacher-NYC High School System, 2791 Univ. 
Ave., Bronz, N. Y. 

Hughes, Harry M., M.A. (Texas) Cost Accountant, Maritime Commission, 1454 Bancroft 
Way, Berkeley 2, Calif. 

Jaramillo, Trinidad J., Ph.D. (Chicago) Research Mathematician, 1947 So. Kedzie Ave., 
Chicago 23, Ill. 

Jones, Warren E., B.A. (Maryville) Owner and Pres. of Management Controls, 699 Rose 
Ave., Des Plaines, Ill. 

Kalinowski, Walbert, Graduate Student in Math. and Statistics, 3689 W. Pine Blvd., St. 
Louis 8, Mo. 

Keeney, Roger D., A.B. (Bucknell) Actuarial Clerk, Metropolitan Life Ins. Co., N. Y., 
N. Y., 110 Fournier Crescent, East Paterson, N. J. 

Keislar, Evan R., Ph.D. (California) Instr., Princeton U., also Research Assoc. College 
Entrance Exam. Board, Nassau Club, Princeton, N.J. 

Keppler, Wharton Fields, B.A. (Ohio State) Math. Statistician, M & R Dietetic Lab., 
Inc., 8 E. Long St., Columkus 16, Ohio. 

Kubis, Assoc. Prof. Joseph F., Ph.D. (Fordham) Dept. of Psychology, Fordham U. Grad. 
School, N. Y., N. Y. 

Leepin, Peter, Ph.D. (Basle) Actuary-Basler Life Ins. Co., Gellerstr. 52, Basle, Switzer- 
land. 

Lefever, Prof. David Welty, Ph.D. (S. California) Dept. of Education, U. of S. Calif., 
University Park, Los Angeles, Calif. 

Likert, Rensis, Ph.D. (Columbia) Head of the Div. of Program Surveys, B.A.E. Dept. 
of Agriculture, Washington, D. C. 

Marks, Eli S., Ph.D. (Columbia) Principal Business Economist, OPA Wash., D. C., 3711 
Horner Place S. E., Washington 20, D. C. 

Martin, Prof. William Ted, Ph.D. (Illinois) Dept. of Math., Svracuse U., Syracuse 10, 
# 

McGann, Paul Williamson, A.B. (Brown) Acting Section Head, Bldg., Material Equip. 
Constr. Price Div., OPA, 2700 Wisconsin Ave., N. W., Washington 7, D.C. 

Michael, William Burton, M.S. (S. California) Lecturer in Math. Psychology, Education, 
388 So. Oak Ave., Pasadena 8, Calif. 

Muench, Prof. Hugo, Dr. P.H. (J.H.U.) Dept. of Biostatistics, Harvard School of Pub. 
Health, 55 Shattuck St., Boston 15, Mass. 

Murphy, Barbara M., Librarian, of Raytheon Mfg. Co., Power Tube Div., Foundry Ave., 
Waltham 54, Mass. 

Murray, Janet H., A.M. (Stanford) Asst. Head-Family Economies Div., Bureau Human 
Nutrition and Home Economies, U. 8. Dept. of Ag., 1025 Connecticut Ave., Washington 
6, D.C. 

Nemmers, Frederic E., M.S. (Iowa) Instr., U. of Wisconsin, 2936 N. Hackett Ave., Mil- 
waukee 11, Wisconsin. 

Neurdenburg, M. G., D.P.H. (Amsterdam) Head of the Bureau of Business-Control and 
Statistics of the Municipal Health Dept. of Amsterdam and Honorary secretary of the 
General Netherlands Society for Public Health and Social Medicine, Frans Van Mier- 
isstraat 184, Amsterdam Zuid 1, Holland. 

Noel, Roland H., M.S. (Massachusetts Col. of Pharmacy) Special Asst. to Production 
Mgr. Penicillin Div., Bristol Labs. Inc., Thompson Rd., Syracuse, N. Y. 

Nordquist, John M., M.S. (Oklahoma) Research Asst. Seismological Lab. 220 N. San Rafael 
Ave., Pasadena 2, Calif. 1695 Corson St., Pasadena 4, Calif. 

O’Connor, Howard J., M.A. (Toronto) Technical Asst., Development Div. Union Car- 
bide and Carbon Research Labs. Inc., 137-47th St., Niagara Falls, N. Y., 1016 Cleveland 
Ave., Niagara Falls, N. Y. 

Odle, John W., Ph.D. (Michigan) Head, Math. Sec., Research and Development, Naval 
Ordnance Test Station, Inyokern, Calif. 





376 NEWS AND NOTICES 


Pascua, Asst. Prof. Marcelino, M.D. (Madrid) Dept. of Biostatistics, Johns Hopking | 
Univ., 615 N. Wolfe St., Baltimore, 5, Md. 

Perlstein, Mae, B.A. (Hunter) Teaching Asst. U. of Calif., 2401 Durant Ave., Berkeley 4,) 
Calif. 

Perrott, Major Ivan Brian, M.A. (Oxford) R. Signals B.A.O.R., 17 Widney Manor Rad., 
Solihull, Warwickshire, England. ’ 

Price, Prof. Griffith Baley, Ph.D. (Harvard) Dept. of Math., 205 Frank Strong Hall, U. of 
Kansas, Lawrence, Kansas. 

Reid, David Buchanan William, B.A. (McGill U.-Montreal), Graduate Student-statistics, 
V.P.I., P. O. Box 461, Blacksburg, Virginia. 

Reynolds, John Hughes III, M.A. (U. of the South) Technical Control Statistician, Cela- 
nese Corp. of America, Tubize Div., Rome, Georgia. 

Reynolds, William A., M.A. (California) Research Associate, National Broadcasting Co., 
30 Rockefeller Plaza, New York 20, N. Y. 

Salerno, John, B.A. (Brooklyn) Draftsman (Statistical), 530 Lincoln Ave., Brooklyn 8,7 
N. ¥. 

Shaw, Byron T., Ph.D. (Ohio State) Principal Agronomist, Plant Industry Station, Belts. 
ville, Maryland. 

Shephard, Asst. Prof. Ronald W., Ph.D. (California) Dept. of Math., Purdue U,, 
Lafayette, Ind. 

Simms, Clifford Raymond, M.S. (Michigan) Consulting Actuary, 1028 Connecticut Ave,, 
N. W., Washington, D. C. 

Sprengel, Herbert J., M.S. (Illinois) Quality Control Engineer, 808 N. Lombard Ave., 
Oak Park, It. 

Stein, Charles M., B.5. (Chicago) Student, Columbia U., 109-69 Colfax St., St. Albans, 11, 
Ms, ; 

Stibitz, George R., Ph.D. (Cornell) Consultant in Applied Mathematics, 393 S. Prospeet 
St., Burlington, Vermont. 

Stone, John Richard Nicholas, M.A. (Cambridge) Director of the Dept. of Applied Eco- 
nomics, U. of Cambridge, England, King’s College, Cambridge, England. 

Studley, Duane Morton, Associate in Arts (Colorado) Clerk HQ 15th AF, 1311 Cheyenne 
Blvd., Colorado Springs, Colorado. 

Tweedy, Marjorie A. L., B.S. (Ohio State) Economist, Office of Price Adm., 1417 N. St. 
N.W., Washington, D.C. 

Updike, Arthur Thomas, Manager, Quality Control Dept. U. 8. Naval Ordnance Plant, 
Indianapolis 6, Lidiana. 

Vandivere, Edgar F., Jr., M.A. (Duke) Radio Engineer, Technical Information Div.,} 
Fed. Comm. Comm., Washington, D. C. 

Wadman, Alton J., B.S. (Mass. Inst. of Technology) Chief, Burst Pattern Analysis Section, | 
VI Fuge Div., NOL, 8720 Colesville, Rd., Silver Spring, Md. 

Watkins, Assoc. Prof. John H., Ph.D. (Yale) Dept. of Public Health, Yale School of Medi- 
cine, New Haven, Conn. 

Wurtele, Zivia S., M.A. (California) Assistant in Math. Statisties, Columbia U., 102 Lez- 
motion Ave., N. ¥.C.16,.N.°Y. 

Zwinggi, Prof. Ernst, Ph.D. (Berne) University of Basle, Subdireector Basler Life Ins, 
Co., Kapellensir. 28, Basle, Switzerland. 





